Python BeautifulSoup 父标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22023992/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
BeautifulSoup parent tag
提问by porteclefs
I have some html that I want to extract text from. Here's an example of the html:
我有一些 html,我想从中提取文本。这是 html 的示例:
<p>TEXT I WANT <i> – </i></p>
Now, there are, obviously, lots of <p>tags in this document. So, find('p')is not a good way to get at the text I want to extract. However, that <i>tag is the only one in the document. So, I thought I could just find the <i>and then go to the parent.
现在,很明显,<p>这个文档中有很多标签。所以,find('p')这不是获取我想要提取的文本的好方法。但是,该<i>标签是文档中唯一的标签。所以,我想我可以找到<i>然后去找父母。
I've tried:
我试过了:
up = soup.select('p i').parent
and
和
up = soup.select('i')
print(up.parent)
and I've tried it with .parents, I've tried find_all('i'), find('i')... But I always get:
我已经尝试过.parents,我已经尝试过find_all('i'),find('i')......但我总是得到:
'list' object has no attribute "parent"
What am I doing wrong?
我究竟做错了什么?
采纳答案by Totem
This works:
这有效:
i_tag = soup.find('i')
my_text = str(i_tag.previousSibling).strip()
output:
输出:
'TEXT I WANT'
As mentioned in other answers, find_all()returns a list, whereas find()returns the first match or None
如其他答案中所述,find_all()返回一个列表,而find()返回第一个匹配项或None
If you are unsure about the presence of an i tag you could simply use a try/exceptblock
如果您不确定 i 标签的存在,您可以简单地使用一个try/except块
回答by Martijn Pieters
find_all()returns a list. find('i')returns the firstmatching element, or None.
find_all()返回一个列表。find('i')返回第一个匹配元素,或None。
Thus, use:
因此,使用:
try:
up = soup.find('i').parent
except AttributeError:
# no <i> element
Demo:
演示:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<p>TEXT I WANT <i> – </i></p>')
>>> soup.find('i').parent
<p>TEXT I WANT <i> – </i></p>
>>> soup.find('i').parent.text
u'TEXT I WANT \u2013 '
回答by amaslenn
Both select()and find_all()return you an array of elements. You should do like follow:
双方select()并find_all()返回您元素的数组。你应该这样做:
for el in soup.select('i'):
print el.parent.text
回答by Chad Frederick
soup.select()returns a Python List. So you have 'unlist' the variable
e.g.:
soup.select()返回一个 Python 列表。所以你有'unlist'变量,例如:
>>> [up] = soup.select('i')
>>> print(up.parent)
or
或者
>>> up = soup.select('i')
>>> print(up[0].parent)

