Python beautifulsoup:bs4.element.ResultSet 对象或列表上的 find_all?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36076052/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
beautifulsoup: find_all on bs4.element.ResultSet object or list?
提问by YJZ
Hi so I apply find_all on a beautifulsoup object
, and find something, which is an bs4.element.ResultSet object
or a list
.
嗨,所以我在 a 上应用 find_all beautifulsoup object
,然后找到一些东西,它是 anbs4.element.ResultSet object
或 a list
。
I want to further do find_all in there, but it's not allowed on a bs4.element.ResultSet object
. I can loop through each element of the bs4.element.ResultSet object
to do find_all. But can I avoid looping and just convert it back to a beautifulsoup object
?
我想在那里进一步做 find_all ,但不允许在 bs4.element.ResultSet object
. 我可以遍历每个元素bs4.element.ResultSet object
来做 find_all。但是我可以避免循环并将其转换回 abeautifulsoup object
吗?
See code for details please. Thanks
详情请看代码。谢谢
html_1 = """
<table>
<thead>
<tr class="myClass">
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
</tr>
</thead>
</table>
"""
soup = BeautifulSoup(html_1, 'html.parser')
type(soup) #bs4.BeautifulSoup
# do find_all on beautifulsoup object
th_all = soup.find_all('th')
# the result is of type bs4.element.ResultSet or similarly list
type(th_all) #bs4.element.ResultSet
type(th_all[0:1]) #list
# now I want to further do find_all
th_all.find_all(text='A') #not work
# can I avoid this need of loop?
for th in th_all:
th.find_all(text='A') #works
回答by alecxe
ResultSet
class is a subclass of a listand not a Tag
classwhich has the find*
methods defined. Looping through the results of find_all()
is the most common approach:
ResultSet
class 是列表的子类,而不是定义了方法的Tag
类find*
。循环遍历结果find_all()
是最常见的方法:
th_all = soup.find_all('th')
result = []
for th in th_all:
result.extend(th.find_all(text='A'))
Usually, CSS selectorsmay help you solve it in one go except that not everything you can do with find_all()
is possible with the select()
method. For instance, there is no "text" search available in bs4
CSS selectors. But, if, for example, you had to find all, say, b
elements inside th
elements, you could do:
通常,CSS 选择器可以帮助您一次性解决问题,但并非您可以find_all()
使用该select()
方法完成所有操作。例如,bs4
CSS 选择器中没有可用的“文本”搜索。但是,例如,如果您必须在b
元素内找到所有th
元素,您可以这样做:
soup.select("th td")