Python 测试 beautifulsoup 中是否存在 children 标签

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33238091/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:04:03  来源:igfitidea点击:

Test if children tag exists in beautifulsoup

pythonxmltestingtagsbeautifulsoup

提问by The Bndr

i have an XML file with an defined structure but different number of tags, like

我有一个具有定义结构但标签数量不同的 XML 文件,例如

file1.xml:

文件1.xml:

<document>
  <subDoc>
    <id>1</id>
    <myId>1</myId>
  </subDoc>
</document>

file2.xml:

文件2.xml:

<document>
  <subDoc>
    <id>2</id>
  </subDoc>
</document>

Now i like to check, if the tag myIdexits. So i did the following:

现在我想检查标签是否myId存在。所以我做了以下事情:

data = open("file1.xml",'r').read()
xml = BeautifulSoup(data)

hasAttrBs = xml.document.subdoc.has_attr('myID')
hasAttrPy = hasattr(xml.document.subdoc,'myID')
hasType = type(xml.document.subdoc.myid)

The result is for file1.xml:

结果是 file1.xml:

hasAttrBs -> False
hasAttrPy -> True
hasType ->   <class 'bs4.element.Tag'>

file2.xml:

文件2.xml:

hasAttrBs -> False
hasAttrPy -> True
hasType -> <type 'NoneType'>

Okay, <myId>is not an attribute of <subdoc>.

好吧,<myId>不是 的属性<subdoc>

But how i can test, if an sub-tag exists?

但是我如何测试,是否存在子标签?

//Edit: By the way: I'm don't really like to iterate trough the whole subdoc, because that will be very slow. I hope to find an way where I can direct address/ask that element.

//编辑:顺便说一句:我真的不喜欢遍历整个子文档,因为那会很慢。我希望找到一种可以直接解决/询问该元素的方法。

采纳答案by wpercy

The simplest way to find if a child tag exists is simply

查找子标签是否存在的最简单方法很简单

childTag = xml.find('childTag')
if childTag:
    # do stuff


More specifically to OP's question:

更具体地说,OP的问题:

If you don't know the structure of the XML doc, you can use the .find()method of the soup. Something like this:

如果你不知道 XML doc 的结构,你可以使用.find()汤的方法。像这样的东西:

with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
    xml = BeautifulSoup(data.read())
    xml2 = BeautifulSoup(data2.read())

    hasAttrBs = xml.find("myId")
    hasAttrBs2 = xml2.find("myId")

If you do know the structure, you can get the desired element by accessing the tag name as an attribute like this xml.document.subdoc.myid. So the whole thing would go something like this:

如果您确实知道结构,则可以通过像这样访问标记名称作为属性来获取所需的元素xml.document.subdoc.myid。所以整个事情会是这样的:

with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
    xml = BeautifulSoup(data.read())
    xml2 = BeautifulSoup(data2.read())

    hasAttrBs = xml.document.subdoc.myid
    hasAttrBs2 = xml2.document.subdoc.myid
    print hasAttrBs
    print hasAttrBs2

Prints

印刷

<myid>1</myid>
None

回答by chyoo CHENG

you can handle it like this:

你可以这样处理:

for child in xml.document.subdoc.children:
    if 'myId' == child.name:
       return True

回答by ahuigo

if tag.find('child_tag_name'):

回答by Mona Jalal

Here's an example to check if h2 tag exists in an Instagram URL. Hope you find it useful:

这是检查 Instagram URL 中是否存在 h2 标签的示例。希望你觉得它有用:

import datetime
import urllib
import requests
from bs4 import BeautifulSoup

instagram_url = 'https://www.instagram.com/p/BHijrYFgX2v/?taken-by=findingmero'
html_source = requests.get(instagram_url).text
soup = BeautifulSoup(html_source, "lxml")

if not soup.find('h2'):
    print("didn't find h2")

回答by Kris Roofe

You can do it with if tag.myID:

你可以用 if tag.myID:

If you want to check if myIDis the direct child not child of child use if tag.find("myID", recursive=False):

如果你想检查是否myID是直接孩子而不是孩子使用的孩子if tag.find("myID", recursive=False):

If you want to check if tag has no child, use if tag.find(True):

如果你想检查标签是否没有孩子,请使用 if tag.find(True):

回答by user2458922

page = requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
page
soup = BeautifulSoup(page.content, 'html.parser')
testNode = list(soup.children)[1]

def hasChild(node):
    print(type(node))
    try:
        node.children
        return True
    except:
        return False

 if( hasChild(testNode) ):
     firstChild=list(testNode.children)[0]
     if( hasChild(firstChild) ):
        print('I found Grand Child ')