Python BeautifulSoup 父标签

Question

提问by porteclefs

I have some html that I want to extract text from. Here's an example of the html:

我有一些 html，我想从中提取文本。这是 html 的示例：

<p>TEXT I WANT <i> &#8211; </i></p>

Now, there are, obviously, lots of tags in this document. So, find('p')is not a good way to get at the text I want to extract. However, that tag is the only one in the document. So, I thought I could just find the and then go to the parent.

现在，很明显，这个文档中有很多标签。所以，find('p')这不是获取我想要提取的文本的好方法。但是，该标签是文档中唯一的标签。所以，我想我可以找到然后去找父母。

I've tried:

我试过了：

up = soup.select('p i').parent

and

和

up = soup.select('i')
print(up.parent)

and I've tried it with .parents, I've tried find_all('i'), find('i')... But I always get:

我已经尝试过.parents，我已经尝试过find_all('i')，find('i')......但我总是得到：

'list' object has no attribute "parent"

What am I doing wrong?

我究竟做错了什么？

Answer 1

采纳答案by Totem

This works:

这有效：

i_tag = soup.find('i')
my_text = str(i_tag.previousSibling).strip()

output:

输出：

'TEXT I WANT'

As mentioned in other answers, find_all()returns a list, whereas find()returns the first match or None

如其他答案中所述，find_all()返回一个列表，而find()返回第一个匹配项或None

If you are unsure about the presence of an i tag you could simply use a try/exceptblock

如果您不确定 i 标签的存在，您可以简单地使用一个try/except块

Answer 2

回答by Martijn Pieters

find_all()returns a list. find('i')returns the firstmatching element, or None.

find_all()返回一个列表。find('i')返回第一个匹配元素，或None。

Thus, use:

因此，使用：

try:
    up = soup.find('i').parent
except AttributeError:
    # no <i> element

Demo:

演示：

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<p>TEXT I WANT <i> &#8211; </i></p>')
>>> soup.find('i').parent
<p>TEXT I WANT <i> – </i></p>
>>> soup.find('i').parent.text
u'TEXT I WANT  \u2013 '

Answer 3

回答by amaslenn

Both select()and find_all()return you an array of elements. You should do like follow:

双方select()并find_all()返回您元素的数组。你应该这样做：

for el in soup.select('i'):
    print el.parent.text

Answer 4

回答by Chad Frederick

soup.select()returns a Python List. So you have 'unlist' the variable e.g.:

soup.select()返回一个 Python 列表。所以你有'unlist'变量，例如：

>>> [up] = soup.select('i')
>>> print(up.parent)

or

或者

>>> up = soup.select('i')
>>> print(up[0].parent)

Python BeautifulSoup 父标签

提问by porteclefs

采纳答案by Totem

回答by Martijn Pieters

回答by amaslenn

回答by Chad Frederick

相关推荐

最近更新

标签

Python BeautifulSoup 父标签

提问by porteclefs

采纳答案by Totem

回答by Martijn Pieters

回答by amaslenn

回答by Chad Frederick

相关推荐

python中的加减法

Python 理解 NumPy 的 einsum

在 Python 中对 Pandas 中的数据帧进行分箱

在python 3.4中连接字符串和整数

相关推荐

最近更新

标签