Python BS4：在标签中获取文本

Question

提问by Milano

I'm using beautiful soup. There is a tag like this:

我正在使用美丽的汤。有一个这样的标签：

<li><a href="example"> s.r.o., <small>small</small></a></li>

I want to get the text within the anchor <a>tag only, without any from the <small>tag in the output; i.e. " s.r.o.,"

我只想获取锚<a>标记中的文本，而不是<small>输出中的任何标记；即“ s.r.o.,”

I tried find('li').text[0]but it does not work.

我试过了，find('li').text[0]但它不起作用。

Is there a command in BS4 which can do that?

BS4中是否有可以做到这一点的命令？

Answer 1

采纳答案by alecxe

One option would be to get the first element from the contentsof the aelement:

一个选择是从元素的contents中获取第一个a元素：

>>> from bs4 import BeautifulSoup
>>> data = '<li><a href="example"> s.r.o., <small>small</small></a></li>'
>>> soup = BeautifulSoup(data)
>>> print soup.find('a').contents[0]
 s.r.o.,

Another one would be to find the smalltag and get the previous sibling:

另一种方法是找到small标签并获取前一个兄弟：

>>> print soup.find('small').previous_sibling
 s.r.o.,

Well, there are all sorts of alternative/crazy options also:

好吧，还有各种替代/疯狂的选择：

>>> print next(soup.find('a').descendants)
 s.r.o., 
>>> print next(iter(soup.find('a')))
 s.r.o.,

Answer 2

回答by Padraic Cunningham

Use .children

使用.children

soup.find('a').children.next()
s.r.o.,

Answer 3

回答by Sumanth Lazarus

If you would like to loop to print all content of anchor tags located in html string/web page (must utilise urlopen from urllib), this works:

如果您想循环打印位于 html 字符串/网页中的锚标记的所有内容（必须使用来自 urllib 的 urlopen），这可以工作：

from bs4 import BeautifulSoup
data = '<li><a href="example">s.r.o., <small>small</small</a></li> <li><a href="example">2nd</a></li> <li><a href="example">3rd</a></li>'
soup = BeautifulSoup(data,'html.parser')
a_tag=soup('a')
for tag in a_tag:
    print(tag.contents[0])     #.contents method to locate text within <a> tags

Output:

输出：

s.r.o.,  
2nd
3rd

a_tagis a list containing all anchor tags; collecting all anchor tags in a list, enables group editing (if more than one <a>tags present.

a_tag是一个包含所有锚标签的列表；收集列表中的所有锚标签，启用组编辑（如果存在多个<a>标签。

>>>print(a_tag)
[<a href="example">s.r.o.,  <small>small</small></a>, <a href="example">2nd</a>, <a href="example">3rd</a>]

Python BS4：在标签中获取文本

提问by Milano

采纳答案by alecxe

回答by Padraic Cunningham

回答by Sumanth Lazarus

相关推荐

最近更新

标签

Python BS4：在标签中获取文本

提问by Milano

采纳答案by alecxe

回答by Padraic Cunningham

回答by Sumanth Lazarus

相关推荐

Python NameError：未定义全局名称“numpy”

Python IDLE 的默认工作目录？

CRC-CCITT 16位Python手动计算

Python 如何从 scikit-learn 决策树中提取决策规则？

相关推荐

最近更新

标签