Python beautifulsoup:bs4.element.ResultSet 对象或列表上的 find_all?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36076052/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:22:21  来源:igfitidea点击:

beautifulsoup: find_all on bs4.element.ResultSet object or list?

pythonhtmlbeautifulsouphtml-parsing

提问by YJZ

Hi so I apply find_all on a beautifulsoup object, and find something, which is an bs4.element.ResultSet objector a list.

嗨,所以我在 a 上应用 find_all beautifulsoup object,然后找到一些东西,它是 anbs4.element.ResultSet object或 a list

I want to further do find_all in there, but it's not allowed on a bs4.element.ResultSet object. I can loop through each element of the bs4.element.ResultSet objectto do find_all. But can I avoid looping and just convert it back to a beautifulsoup object?

我想在那里进一步做 find_all ,但不允许在 bs4.element.ResultSet object. 我可以遍历每个元素bs4.element.ResultSet object来做 find_all。但是我可以避免循环并将其转换回 abeautifulsoup object吗?

See code for details please. Thanks

详情请看代码。谢谢

html_1 = """
<table>
    <thead>
        <tr class="myClass">
            <th>A</th>
            <th>B</th>
            <th>C</th>
            <th>D</th>
        </tr>
    </thead>
</table>
"""
soup = BeautifulSoup(html_1, 'html.parser')

type(soup) #bs4.BeautifulSoup

# do find_all on beautifulsoup object
th_all = soup.find_all('th')

# the result is of type bs4.element.ResultSet or similarly list
type(th_all) #bs4.element.ResultSet
type(th_all[0:1]) #list

# now I want to further do find_all
th_all.find_all(text='A') #not work

# can I avoid this need of loop?
for th in th_all:
    th.find_all(text='A') #works

回答by alecxe

ResultSetclass is a subclass of a listand not a Tagclasswhich has the find*methods defined. Looping through the results of find_all()is the most common approach:

ResultSetclass 是列表子类,而不是定义了方法的Tagfind*。循环遍历结果find_all()是最常见的方法:

th_all = soup.find_all('th')
result = []
for th in th_all:
    result.extend(th.find_all(text='A'))

Usually, CSS selectorsmay help you solve it in one go except that not everything you can do with find_all()is possible with the select()method. For instance, there is no "text" search available in bs4CSS selectors. But, if, for example, you had to find all, say, belements inside thelements, you could do:

通常,CSS 选择器可以帮助您一次性解决问题,但并非您可以find_all()使用该select()方法完成所有操作。例如,bs4CSS 选择器中没有可用的“文本”搜索。但是,例如,如果您必须在b元素内找到所有th元素,您可以这样做:

soup.select("th td")