Python BeautifulSoup findAll() 给出了多个类?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18725760/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:36:36  来源:igfitidea点击:

BeautifulSoup findAll() given multiple classes?

pythonhtmlbeautifulsouphtml-parsing

提问by sebo

I would like to scrape a list of items from a website, and preserve the order that they are presented in. These items are organized in a table, but they can be one of two different classes (in random order).

我想从网站上抓取项目列表,并保留它们的显示顺序。这些项目组织在一个表中,但它们可以是两个不同的类之一(以随机顺序)。

Is there any way to provide multiple classes and have BeautifulSoup4 find all items which are in any of the given classes?

有没有办法提供多个类并让 BeautifulSoup4 找到任何给定类中的所有项目?

I need to achieve what this code does, except preserve the order of items as it was in the source code:

我需要实现此代码的功能,除了保留源代码中的项目顺序:

items = soup.findAll(True,{'class':'class1'})
items += soup.findAll(True,{'class':'class2'})

回答by alecxe

One way to do it is to use regular expression instead of a class name:

一种方法是使用正则表达式而不是类名:

import re
import requests
from bs4 import BeautifulSoup


s = requests.Session()
link = 'https://leaderboards.guildwars2.com/en/na/achievements'
r = s.get(link)


soup = BeautifulSoup(r.text)
for item in soup.findAll(True, {"class": re.compile("^(equal|up)$")}):
    if 'achievements' in item.attrs['class'] and 'number' in item.attrs['class']:
        print item

回答by Roman Pekar

you can do this

你可以这样做

soup.findAll(True, {'class':['class1', 'class2']})

example:

例子:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<html><body><div class="class1"></div><div class="class2"></div><div class="class3"></div></body></html>')
>>> soup.findAll(True, {"class":["class1", "class2"]})
[<div class="class1"></div>, <div class="class2"></div>]

回答by Bhoopi

I am new to Python with BeautifulSoup but may be my answer help you. I came across the same situation where I have to find multiple classes of one tag so, I just pass the classes into an array and it works for me. Here is the code snippet

我是使用 BeautifulSoup 的 Python 新手,但我的回答可能对您有所帮助。我遇到了同样的情况,我必须找到一个标签的多个类,所以我只是将这些类传递到一个数组中,它对我有用。这是代码片段

//Search with single Class
    find_all("tr",  {"class":"abc"})
//Search with multiple classes
    find_all("tr",  {"class": ["abc", "xyz"]})

回答by Abdelghani Bekka

Or this with the more recent version of BeautifulSoup:

或者使用最新版本的 BeautifulSoup:

find_all('a', class_=['class1', 'class2'])

Using "class" would return an error so they use "class_" instead.

使用“class”会返回一个错误,所以他们使用“class_”代替。