Python 测试 beautifulsoup 中是否存在 children 标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33238091/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Test if children tag exists in beautifulsoup
提问by The Bndr
i have an XML file with an defined structure but different number of tags, like
我有一个具有定义结构但标签数量不同的 XML 文件,例如
file1.xml:
文件1.xml:
<document>
<subDoc>
<id>1</id>
<myId>1</myId>
</subDoc>
</document>
file2.xml:
文件2.xml:
<document>
<subDoc>
<id>2</id>
</subDoc>
</document>
Now i like to check, if the tag myId
exits. So i did the following:
现在我想检查标签是否myId
存在。所以我做了以下事情:
data = open("file1.xml",'r').read()
xml = BeautifulSoup(data)
hasAttrBs = xml.document.subdoc.has_attr('myID')
hasAttrPy = hasattr(xml.document.subdoc,'myID')
hasType = type(xml.document.subdoc.myid)
The result is for file1.xml:
结果是 file1.xml:
hasAttrBs -> False
hasAttrPy -> True
hasType -> <class 'bs4.element.Tag'>
file2.xml:
文件2.xml:
hasAttrBs -> False
hasAttrPy -> True
hasType -> <type 'NoneType'>
Okay, <myId>
is not an attribute of <subdoc>
.
好吧,<myId>
不是 的属性<subdoc>
。
But how i can test, if an sub-tag exists?
但是我如何测试,是否存在子标签?
//Edit: By the way: I'm don't really like to iterate trough the whole subdoc, because that will be very slow. I hope to find an way where I can direct address/ask that element.
//编辑:顺便说一句:我真的不喜欢遍历整个子文档,因为那会很慢。我希望找到一种可以直接解决/询问该元素的方法。
采纳答案by wpercy
The simplest way to find if a child tag exists is simply
查找子标签是否存在的最简单方法很简单
childTag = xml.find('childTag')
if childTag:
# do stuff
More specifically to OP's question:
更具体地说,OP的问题:
If you don't know the structure of the XML doc, you can use the .find()
method of the soup. Something like this:
如果你不知道 XML doc 的结构,你可以使用.find()
汤的方法。像这样的东西:
with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
xml = BeautifulSoup(data.read())
xml2 = BeautifulSoup(data2.read())
hasAttrBs = xml.find("myId")
hasAttrBs2 = xml2.find("myId")
If you do know the structure, you can get the desired element by accessing the tag name as an attribute like this xml.document.subdoc.myid
. So the whole thing would go something like this:
如果您确实知道结构,则可以通过像这样访问标记名称作为属性来获取所需的元素xml.document.subdoc.myid
。所以整个事情会是这样的:
with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
xml = BeautifulSoup(data.read())
xml2 = BeautifulSoup(data2.read())
hasAttrBs = xml.document.subdoc.myid
hasAttrBs2 = xml2.document.subdoc.myid
print hasAttrBs
print hasAttrBs2
Prints
印刷
<myid>1</myid>
None
回答by chyoo CHENG
you can handle it like this:
你可以这样处理:
for child in xml.document.subdoc.children:
if 'myId' == child.name:
return True
回答by ahuigo
if tag.find('child_tag_name'):
回答by Mona Jalal
Here's an example to check if h2 tag exists in an Instagram URL. Hope you find it useful:
这是检查 Instagram URL 中是否存在 h2 标签的示例。希望你觉得它有用:
import datetime
import urllib
import requests
from bs4 import BeautifulSoup
instagram_url = 'https://www.instagram.com/p/BHijrYFgX2v/?taken-by=findingmero'
html_source = requests.get(instagram_url).text
soup = BeautifulSoup(html_source, "lxml")
if not soup.find('h2'):
print("didn't find h2")
回答by Kris Roofe
You can do it with if tag.myID:
你可以用 if tag.myID:
If you want to check if myID
is the direct child not child of child use if tag.find("myID", recursive=False):
如果你想检查是否myID
是直接孩子而不是孩子使用的孩子if tag.find("myID", recursive=False):
If you want to check if tag has no child, use if tag.find(True):
如果你想检查标签是否没有孩子,请使用 if tag.find(True):
回答by user2458922
page = requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
page
soup = BeautifulSoup(page.content, 'html.parser')
testNode = list(soup.children)[1]
def hasChild(node):
print(type(node))
try:
node.children
return True
except:
return False
if( hasChild(testNode) ):
firstChild=list(testNode.children)[0]
if( hasChild(firstChild) ):
print('I found Grand Child ')