Python BeautifulSoup findAll() 给出了多个类?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18725760/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
BeautifulSoup findAll() given multiple classes?
提问by sebo
I would like to scrape a list of items from a website, and preserve the order that they are presented in. These items are organized in a table, but they can be one of two different classes (in random order).
我想从网站上抓取项目列表,并保留它们的显示顺序。这些项目组织在一个表中,但它们可以是两个不同的类之一(以随机顺序)。
Is there any way to provide multiple classes and have BeautifulSoup4 find all items which are in any of the given classes?
有没有办法提供多个类并让 BeautifulSoup4 找到任何给定类中的所有项目?
I need to achieve what this code does, except preserve the order of items as it was in the source code:
我需要实现此代码的功能,除了保留源代码中的项目顺序:
items = soup.findAll(True,{'class':'class1'})
items += soup.findAll(True,{'class':'class2'})
回答by alecxe
One way to do it is to use regular expression instead of a class name:
一种方法是使用正则表达式而不是类名:
import re
import requests
from bs4 import BeautifulSoup
s = requests.Session()
link = 'https://leaderboards.guildwars2.com/en/na/achievements'
r = s.get(link)
soup = BeautifulSoup(r.text)
for item in soup.findAll(True, {"class": re.compile("^(equal|up)$")}):
if 'achievements' in item.attrs['class'] and 'number' in item.attrs['class']:
print item
回答by Roman Pekar
you can do this
你可以这样做
soup.findAll(True, {'class':['class1', 'class2']})
example:
例子:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<html><body><div class="class1"></div><div class="class2"></div><div class="class3"></div></body></html>')
>>> soup.findAll(True, {"class":["class1", "class2"]})
[<div class="class1"></div>, <div class="class2"></div>]
回答by Bhoopi
I am new to Python with BeautifulSoup but may be my answer help you. I came across the same situation where I have to find multiple classes of one tag so, I just pass the classes into an array and it works for me. Here is the code snippet
我是使用 BeautifulSoup 的 Python 新手,但我的回答可能对您有所帮助。我遇到了同样的情况,我必须找到一个标签的多个类,所以我只是将这些类传递到一个数组中,它对我有用。这是代码片段
//Search with single Class
find_all("tr", {"class":"abc"})
//Search with multiple classes
find_all("tr", {"class": ["abc", "xyz"]})
回答by Abdelghani Bekka
Or this with the more recent version of BeautifulSoup:
或者使用最新版本的 BeautifulSoup:
find_all('a', class_=['class1', 'class2'])
Using "class" would return an error so they use "class_" instead.
使用“class”会返回一个错误,所以他们使用“class_”代替。