Python 使用 Beautiful Soup 查找下一个出现的标签及其包含的文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21823229/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:41:13  来源:igfitidea点击:

Finding next occurring tag and its enclosed text with Beautiful Soup

pythonhtmlpython-2.7beautifulsoup

提问by PSeUdocode

I'm trying to parse text between the tag <blockquote>. When I type soup.blockquote.get_text().

我正在尝试解析 tag 之间的文本<blockquote>。当我输入soup.blockquote.get_text().

I get the result I want for the first occurring blockquote in the HTML file. How do I find the next and sequential <blockquote>tag in the file? Maybe I'm just tired and can't find it in the documentation.

我得到了我想要的 HTML 文件中第一个出现的块引用的结果。如何找到文件中的下一个和顺序<blockquote>标签?也许我只是累了,在文档中找不到它。

Example HTML file:

示例 HTML 文件:

<html>
<head>header
</head>
<blockquote>I can get this text
</blockquote>
<p>eiaoiefj</p>
<blockquote>trying to capture this next
</blockquote>
<p></p><strong>do not capture this</strong>
<blockquote>
capture this too but separately after "capture this next"
</blockquote>
</html>

the simple python code:

简单的python代码:

from bs4 import BeautifulSoup

html_doc = open("example.html")
soup = BeautifulSoup(html_doc)
print.(soup.blockquote.get_text())
# how to get the next blockquote???

采纳答案by falsetru

Use find_next_sibling(If it not a sibling, use find_nextinstead)

使用find_next_sibling(如果它不是兄弟姐妹,请find_next改用)

>>> html = '''
... <html>
... <head>header
... </head>
... <blockquote>blah blah
... </blockquote>
... <p>eiaoiefj</p>
... <blockquote>capture this next
... </blockquote>
... <p></p><strong>don'tcapturethis</strong>
... <blockquote>
... capture this too but separately after "capture this next"
... </blockquote>
... </html>
... '''

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(html)
>>> quote1 = soup.blockquote
>>> quote1.text
u'blah blah\n'
>>> quote2 = quote1.find_next_siblings('blockquote')
>>> quote2.text
u'capture this next\n'