Python 测试 beautifulsoup 中是否存在 children 标签

Question

提问by The Bndr

i have an XML file with an defined structure but different number of tags, like

我有一个具有定义结构但标签数量不同的 XML 文件，例如

file1.xml:

文件1.xml：

<document>
  <subDoc>
    <id>1</id>
    <myId>1</myId>
  </subDoc>
</document>

file2.xml:

文件2.xml：

<document>
  <subDoc>
    <id>2</id>
  </subDoc>
</document>

Now i like to check, if the tag myIdexits. So i did the following:

现在我想检查标签是否myId存在。所以我做了以下事情：

data = open("file1.xml",'r').read()
xml = BeautifulSoup(data)

hasAttrBs = xml.document.subdoc.has_attr('myID')
hasAttrPy = hasattr(xml.document.subdoc,'myID')
hasType = type(xml.document.subdoc.myid)

The result is for file1.xml:

结果是 file1.xml：

hasAttrBs -> False
hasAttrPy -> True
hasType ->   <class 'bs4.element.Tag'>

file2.xml:

文件2.xml：

hasAttrBs -> False
hasAttrPy -> True
hasType -> <type 'NoneType'>

Okay, <myId>is not an attribute of <subdoc>.

好吧，<myId>不是的属性<subdoc>。

But how i can test, if an sub-tag exists?

但是我如何测试，是否存在子标签？

//Edit: By the way: I'm don't really like to iterate trough the whole subdoc, because that will be very slow. I hope to find an way where I can direct address/ask that element.

//编辑：顺便说一句：我真的不喜欢遍历整个子文档，因为那会很慢。我希望找到一种可以直接解决/询问该元素的方法。

Answer 1

采纳答案by wpercy

The simplest way to find if a child tag exists is simply

查找子标签是否存在的最简单方法很简单

childTag = xml.find('childTag')
if childTag:
    # do stuff

More specifically to OP's question:

更具体地说，OP的问题：

If you don't know the structure of the XML doc, you can use the .find()method of the soup. Something like this:

如果你不知道 XML doc 的结构，你可以使用.find()汤的方法。像这样的东西：

with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
    xml = BeautifulSoup(data.read())
    xml2 = BeautifulSoup(data2.read())

    hasAttrBs = xml.find("myId")
    hasAttrBs2 = xml2.find("myId")

If you do know the structure, you can get the desired element by accessing the tag name as an attribute like this xml.document.subdoc.myid. So the whole thing would go something like this:

如果您确实知道结构，则可以通过像这样访问标记名称作为属性来获取所需的元素xml.document.subdoc.myid。所以整个事情会是这样的：

with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
    xml = BeautifulSoup(data.read())
    xml2 = BeautifulSoup(data2.read())

    hasAttrBs = xml.document.subdoc.myid
    hasAttrBs2 = xml2.document.subdoc.myid
    print hasAttrBs
    print hasAttrBs2

Prints

印刷

<myid>1</myid>
None

Answer 2

回答by chyoo CHENG

you can handle it like this:

你可以这样处理：

for child in xml.document.subdoc.children:
    if 'myId' == child.name:
       return True

Answer 3

回答by ahuigo

if tag.find('child_tag_name'):

Answer 4

回答by Mona Jalal

Here's an example to check if h2 tag exists in an Instagram URL. Hope you find it useful:

这是检查 Instagram URL 中是否存在 h2 标签的示例。希望你觉得它有用：

import datetime
import urllib
import requests
from bs4 import BeautifulSoup

instagram_url = 'https://www.instagram.com/p/BHijrYFgX2v/?taken-by=findingmero'
html_source = requests.get(instagram_url).text
soup = BeautifulSoup(html_source, "lxml")

if not soup.find('h2'):
    print("didn't find h2")

Answer 5

回答by Kris Roofe

You can do it with if tag.myID:

你可以用 if tag.myID:

If you want to check if myIDis the direct child not child of child use if tag.find("myID", recursive=False):

如果你想检查是否myID是直接孩子而不是孩子使用的孩子if tag.find("myID", recursive=False):

If you want to check if tag has no child, use if tag.find(True):

如果你想检查标签是否没有孩子，请使用 if tag.find(True):

Answer 6

回答by user2458922

page = requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
page
soup = BeautifulSoup(page.content, 'html.parser')
testNode = list(soup.children)[1]

def hasChild(node):
    print(type(node))
    try:
        node.children
        return True
    except:
        return False

 if( hasChild(testNode) ):
     firstChild=list(testNode.children)[0]
     if( hasChild(firstChild) ):
        print('I found Grand Child ')

Python 测试 beautifulsoup 中是否存在 children 标签

提问by The Bndr

采纳答案by wpercy

回答by chyoo CHENG

回答by ahuigo

回答by Mona Jalal

回答by Kris Roofe

回答by user2458922

相关推荐

最近更新

标签

Python 测试 beautifulsoup 中是否存在 children 标签

提问by The Bndr

采纳答案by wpercy

回答by chyoo CHENG

回答by ahuigo

回答by Mona Jalal

回答by Kris Roofe

回答by user2458922

相关推荐

在 Mac 上使用 Python 3.5.0 + Sublime 3.0 运行代码

Python 找不到记录器 paramiko 的处理程序

Python 将 3d 数组重塑为 2d

Python 根据列表索引选择 Pandas 行

相关推荐

最近更新

标签