Python BS4:在标签中获取文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25251841/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:57:20  来源:igfitidea点击:

BS4: Getting text in tag

pythonhtmlparsinghtml-parsingbeautifulsoup

提问by Milano

I'm using beautiful soup. There is a tag like this:

我正在使用美丽的汤。有一个这样的标签:

<li><a href="example"> s.r.o., <small>small</small></a></li>

<li><a href="example"> s.r.o., <small>small</small></a></li>

I want to get the text within the anchor <a>tag only, without any from the <small>tag in the output; i.e. " s.r.o.,"

我只想获取锚<a>标记中的文本,而不是<small>输出中的任何标记;即“ s.r.o.,

I tried find('li').text[0]but it does not work.

我试过了,find('li').text[0]但它不起作用。

Is there a command in BS4 which can do that?

BS4中是否有可以做到这一点的命令?

采纳答案by alecxe

One option would be to get the first element from the contentsof the aelement:

一个选择是从元素的contents中获取第一个a元素:

>>> from bs4 import BeautifulSoup
>>> data = '<li><a href="example"> s.r.o., <small>small</small></a></li>'
>>> soup = BeautifulSoup(data)
>>> print soup.find('a').contents[0]
 s.r.o., 

Another one would be to find the smalltag and get the previous sibling:

另一种方法是找到small标签并获取前一个兄弟

>>> print soup.find('small').previous_sibling
 s.r.o., 


Well, there are all sorts of alternative/crazy options also:

好吧,还有各种替代/疯狂的选择:

>>> print next(soup.find('a').descendants)
 s.r.o., 
>>> print next(iter(soup.find('a')))
 s.r.o., 

回答by Padraic Cunningham

Use .children

使用.children

soup.find('a').children.next()
s.r.o.,

回答by Sumanth Lazarus

If you would like to loop to print all content of anchor tags located in html string/web page (must utilise urlopen from urllib), this works:

如果您想循环打印位于 html 字符串/网页中的锚标记的所有内容(必须使用来自 urllib 的 urlopen),这可以工作:

from bs4 import BeautifulSoup
data = '<li><a href="example">s.r.o., <small>small</small</a></li> <li><a href="example">2nd</a></li> <li><a href="example">3rd</a></li>'
soup = BeautifulSoup(data,'html.parser')
a_tag=soup('a')
for tag in a_tag:
    print(tag.contents[0])     #.contents method to locate text within <a> tags

Output:

输出:

s.r.o.,  
2nd
3rd

a_tagis a list containing all anchor tags; collecting all anchor tags in a list, enables group editing (if more than one <a>tags present.

a_tag是一个包含所有锚标签的列表;收集列表中的所有锚标签,启用组编辑(如果存在多个<a>标签。

>>>print(a_tag)
[<a href="example">s.r.o.,  <small>small</small></a>, <a href="example">2nd</a>, <a href="example">3rd</a>]