Python 测试 beautifulsoup 中是否存在 children 标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33238091/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Test if children tag exists in beautifulsoup
提问by The Bndr
i have an XML file with an defined structure but different number of tags, like
我有一个具有定义结构但标签数量不同的 XML 文件,例如
file1.xml:
文件1.xml:
<document>
<subDoc>
<id>1</id>
<myId>1</myId>
</subDoc>
</document>
file2.xml:
文件2.xml:
<document>
<subDoc>
<id>2</id>
</subDoc>
</document>
Now i like to check, if the tag myIdexits. So i did the following:
现在我想检查标签是否myId存在。所以我做了以下事情:
data = open("file1.xml",'r').read()
xml = BeautifulSoup(data)
hasAttrBs = xml.document.subdoc.has_attr('myID')
hasAttrPy = hasattr(xml.document.subdoc,'myID')
hasType = type(xml.document.subdoc.myid)
The result is for file1.xml:
结果是 file1.xml:
hasAttrBs -> False
hasAttrPy -> True
hasType -> <class 'bs4.element.Tag'>
file2.xml:
文件2.xml:
hasAttrBs -> False
hasAttrPy -> True
hasType -> <type 'NoneType'>
Okay, <myId>is not an attribute of <subdoc>.
好吧,<myId>不是 的属性<subdoc>。
But how i can test, if an sub-tag exists?
但是我如何测试,是否存在子标签?
//Edit: By the way: I'm don't really like to iterate trough the whole subdoc, because that will be very slow. I hope to find an way where I can direct address/ask that element.
//编辑:顺便说一句:我真的不喜欢遍历整个子文档,因为那会很慢。我希望找到一种可以直接解决/询问该元素的方法。
采纳答案by wpercy
The simplest way to find if a child tag exists is simply
查找子标签是否存在的最简单方法很简单
childTag = xml.find('childTag')
if childTag:
# do stuff
More specifically to OP's question:
更具体地说,OP的问题:
If you don't know the structure of the XML doc, you can use the .find()method of the soup. Something like this:
如果你不知道 XML doc 的结构,你可以使用.find()汤的方法。像这样的东西:
with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
xml = BeautifulSoup(data.read())
xml2 = BeautifulSoup(data2.read())
hasAttrBs = xml.find("myId")
hasAttrBs2 = xml2.find("myId")
If you do know the structure, you can get the desired element by accessing the tag name as an attribute like this xml.document.subdoc.myid. So the whole thing would go something like this:
如果您确实知道结构,则可以通过像这样访问标记名称作为属性来获取所需的元素xml.document.subdoc.myid。所以整个事情会是这样的:
with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
xml = BeautifulSoup(data.read())
xml2 = BeautifulSoup(data2.read())
hasAttrBs = xml.document.subdoc.myid
hasAttrBs2 = xml2.document.subdoc.myid
print hasAttrBs
print hasAttrBs2
Prints
印刷
<myid>1</myid>
None
回答by chyoo CHENG
you can handle it like this:
你可以这样处理:
for child in xml.document.subdoc.children:
if 'myId' == child.name:
return True
回答by ahuigo
if tag.find('child_tag_name'):
回答by Mona Jalal
Here's an example to check if h2 tag exists in an Instagram URL. Hope you find it useful:
这是检查 Instagram URL 中是否存在 h2 标签的示例。希望你觉得它有用:
import datetime
import urllib
import requests
from bs4 import BeautifulSoup
instagram_url = 'https://www.instagram.com/p/BHijrYFgX2v/?taken-by=findingmero'
html_source = requests.get(instagram_url).text
soup = BeautifulSoup(html_source, "lxml")
if not soup.find('h2'):
print("didn't find h2")
回答by Kris Roofe
You can do it with if tag.myID:
你可以用 if tag.myID:
If you want to check if myIDis the direct child not child of child use if tag.find("myID", recursive=False):
如果你想检查是否myID是直接孩子而不是孩子使用的孩子if tag.find("myID", recursive=False):
If you want to check if tag has no child, use if tag.find(True):
如果你想检查标签是否没有孩子,请使用 if tag.find(True):
回答by user2458922
page = requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
page
soup = BeautifulSoup(page.content, 'html.parser')
testNode = list(soup.children)[1]
def hasChild(node):
print(type(node))
try:
node.children
return True
except:
return False
if( hasChild(testNode) ):
firstChild=list(testNode.children)[0]
if( hasChild(firstChild) ):
print('I found Grand Child ')

