Python BeautifulSoup 父标签

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22023992/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:05:50  来源:igfitidea点击:

BeautifulSoup parent tag

pythonhtml-parsingbeautifulsoup

提问by porteclefs

I have some html that I want to extract text from. Here's an example of the html:

我有一些 html,我想从中提取文本。这是 html 的示例:

<p>TEXT I WANT <i> &#8211; </i></p>

Now, there are, obviously, lots of <p>tags in this document. So, find('p')is not a good way to get at the text I want to extract. However, that <i>tag is the only one in the document. So, I thought I could just find the <i>and then go to the parent.

现在,很明显,<p>这个文档中有很多标签。所以,find('p')这不是获取我想要提取的文本的好方法。但是,该<i>标签是文档中唯一的标签。所以,我想我可以找到<i>然后去找父母。

I've tried:

我试过了:

up = soup.select('p i').parent

and

up = soup.select('i')
print(up.parent)

and I've tried it with .parents, I've tried find_all('i'), find('i')... But I always get:

我已经尝试过.parents,我已经尝试过find_all('i')find('i')......但我总是得到:

'list' object has no attribute "parent"

What am I doing wrong?

我究竟做错了什么?

采纳答案by Totem

This works:

这有效:

i_tag = soup.find('i')
my_text = str(i_tag.previousSibling).strip()

output:

输出:

'TEXT I WANT'

As mentioned in other answers, find_all()returns a list, whereas find()returns the first match or None

如其他答案中所述,find_all()返回一个列表,而find()返回第一个匹配项或None

If you are unsure about the presence of an i tag you could simply use a try/exceptblock

如果您不确定 i 标签的存在,您可以简单地使用一个try/except

回答by Martijn Pieters

find_all()returns a list. find('i')returns the firstmatching element, or None.

find_all()返回一个列表。find('i')返回第一个匹配元素,或None

Thus, use:

因此,使用:

try:
    up = soup.find('i').parent
except AttributeError:
    # no <i> element

Demo:

演示:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<p>TEXT I WANT <i> &#8211; </i></p>')
>>> soup.find('i').parent
<p>TEXT I WANT <i> – </i></p>
>>> soup.find('i').parent.text
u'TEXT I WANT  \u2013 '

回答by amaslenn

Both select()and find_all()return you an array of elements. You should do like follow:

双方select()find_all()返回您元素的数组。你应该这样做:

for el in soup.select('i'):
    print el.parent.text

回答by Chad Frederick

soup.select()returns a Python List. So you have 'unlist' the variable e.g.:

soup.select()返回一个 Python 列表。所以你有'unlist'变量,例如:

>>> [up] = soup.select('i')
>>> print(up.parent)

or

或者

>>> up = soup.select('i')
>>> print(up[0].parent)