Python Beautiful Soup:“ResultSet”对象没有“find_all”属性?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24108507/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Beautiful Soup: 'ResultSet' object has no attribute 'find_all'?
提问by Anton
I am trying to scrape a simple table using Beautiful Soup. Here is my code:
我正在尝试使用 Beautiful Soup 刮一张简单的桌子。这是我的代码:
import requests
from bs4 import BeautifulSoup
url = 'https://gist.githubusercontent.com/anonymous/c8eedd8bf41098a8940b/raw/c7e01a76d753f6e8700b54821e26ee5dde3199ab/gistfile1.txt'
r = requests.get(url)
soup = BeautifulSoup(r.text)
table = soup.find_all(class_='dataframe')
first_name = []
last_name = []
age = []
preTestScore = []
postTestScore = []
for row in table.find_all('tr'):
col = table.find_all('td')
column_1 = col[0].string.strip()
first_name.append(column_1)
column_2 = col[1].string.strip()
last_name.append(column_2)
column_3 = col[2].string.strip()
age.append(column_3)
column_4 = col[3].string.strip()
preTestScore.append(column_4)
column_5 = col[4].string.strip()
postTestScore.append(column_5)
columns = {'first_name': first_name, 'last_name': last_name, 'age': age, 'preTestScore': preTestScore, 'postTestScore': postTestScore}
df = pd.DataFrame(columns)
df
However, whenever I run it, I get this error:
但是,每当我运行它时,都会出现此错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-116-a900c2872793> in <module>()
14 postTestScore = []
15
---> 16 for row in table.find_all('tr'):
17 col = table.find_all('td')
18
AttributeError: 'ResultSet' object has no attribute 'find_all'
I have read around a dozen StackOverflow questions about this error, and I cannot figure out what I am doing wrong.
我已经阅读了十几个关于这个错误的 StackOverflow 问题,但我无法弄清楚我做错了什么。
回答by otus
table = soup.find_all(class_='dataframe')
table = soup.find_all(class_='dataframe')
This gives you a result set – i.e. allthe elements that match the class. You can either iterate over them or, if you know you only have one dataFrame
, you can use find
instead. From your code it seems the latter is what you need, to deal with the immediate problem:
这为您提供了一个结果集——即与类匹配的所有元素。您可以迭代它们,或者,如果您知道只有一个dataFrame
,则可以find
改用。从您的代码看来,后者是您需要的,以处理眼前的问题:
table = soup.find(class_='dataframe')
However, that is not all:
然而,这还不是全部:
for row in table.find_all('tr'):
col = table.find_all('td')
You probably want to iterate over the td
s in the row here, rather than the whole table. (Otherwise you'll just see the first row over and over.)
您可能想td
在这里遍历行中的s,而不是整个表。(否则你只会一遍又一遍地看到第一行。)
for row in table.find_all('tr'):
for col in row.find_all('td'):
回答by Ralf Haring
The table
variable contains an array. You would need to call find_all
on its members (even though you know it's an array with only one member), not on the entire thing.
该table
变量包含一个数组。你需要调用find_all
它的成员(即使你知道它是一个只有一个成员的数组),而不是整个事情。
>>> type(table)
<class 'bs4.element.ResultSet'>
>>> type(table[0])
<class 'bs4.element.Tag'>
>>> len(table[0].find_all('tr'))
6
>>>
回答by Padraic Cunningham
Iterate over table and use rowfind_all('td')
迭代表并使用 rowfind_all('td')
for row in table:
col = row.find_all('td')