Python Beautiful Soup：“ResultSet”对象没有“find_all”属性？

Question

提问by Anton

I am trying to scrape a simple table using Beautiful Soup. Here is my code:

我正在尝试使用 Beautiful Soup 刮一张简单的桌子。这是我的代码：

import requests
from bs4 import BeautifulSoup

url = 'https://gist.githubusercontent.com/anonymous/c8eedd8bf41098a8940b/raw/c7e01a76d753f6e8700b54821e26ee5dde3199ab/gistfile1.txt'
r = requests.get(url)

soup = BeautifulSoup(r.text)
table = soup.find_all(class_='dataframe')

first_name = []
last_name = []
age = []
preTestScore = []
postTestScore = []

for row in table.find_all('tr'):
    col = table.find_all('td')

    column_1 = col[0].string.strip()
    first_name.append(column_1)

    column_2 = col[1].string.strip()
    last_name.append(column_2)

    column_3 = col[2].string.strip()
    age.append(column_3)

    column_4 = col[3].string.strip()
    preTestScore.append(column_4)

    column_5 = col[4].string.strip()
    postTestScore.append(column_5)

columns = {'first_name': first_name, 'last_name': last_name, 'age': age, 'preTestScore': preTestScore, 'postTestScore': postTestScore}
df = pd.DataFrame(columns)
df

However, whenever I run it, I get this error:

但是，每当我运行它时，都会出现此错误：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-116-a900c2872793> in <module>()
     14 postTestScore = []
     15 
---> 16 for row in table.find_all('tr'):
     17     col = table.find_all('td')
     18 

AttributeError: 'ResultSet' object has no attribute 'find_all'

I have read around a dozen StackOverflow questions about this error, and I cannot figure out what I am doing wrong.

我已经阅读了十几个关于这个错误的 StackOverflow 问题，但我无法弄清楚我做错了什么。

Answer 1

回答by otus

table = soup.find_all(class_='dataframe')

table = soup.find_all(class_='dataframe')

This gives you a result set – i.e. allthe elements that match the class. You can either iterate over them or, if you know you only have one dataFrame, you can use findinstead. From your code it seems the latter is what you need, to deal with the immediate problem:

这为您提供了一个结果集——即与类匹配的所有元素。您可以迭代它们，或者，如果您知道只有一个dataFrame，则可以find改用。从您的代码看来，后者是您需要的，以处理眼前的问题：

table = soup.find(class_='dataframe')

However, that is not all:

然而，这还不是全部：

for row in table.find_all('tr'):
    col = table.find_all('td')

You probably want to iterate over the tds in the row here, rather than the whole table. (Otherwise you'll just see the first row over and over.)

您可能想td在这里遍历行中的s，而不是整个表。（否则你只会一遍又一遍地看到第一行。）

for row in table.find_all('tr'):
    for col in row.find_all('td'):

Answer 2

回答by Ralf Haring

The tablevariable contains an array. You would need to call find_allon its members (even though you know it's an array with only one member), not on the entire thing.

该table变量包含一个数组。你需要调用find_all它的成员（即使你知道它是一个只有一个成员的数组），而不是整个事情。

>>> type(table)
<class 'bs4.element.ResultSet'>
>>> type(table[0])
<class 'bs4.element.Tag'>
>>> len(table[0].find_all('tr'))
6
>>>

Answer 3

回答by Padraic Cunningham

Iterate over table and use rowfind_all('td')

迭代表并使用 rowfind_all('td')

   for row in table:
        col = row.find_all('td')

Python Beautiful Soup：“ResultSet”对象没有“find_all”属性？

提问by Anton

回答by otus

回答by Ralf Haring

回答by Padraic Cunningham

相关推荐

最近更新

标签

Python Beautiful Soup：“ResultSet”对象没有“find_all”属性？

提问by Anton

回答by otus

回答by Ralf Haring

回答by Padraic Cunningham

相关推荐

Python 如何使用 Windows 身份验证通过 sqlalchemy 连接到 SQL Server？

Python 如何解决ImportError“No module named pycurl”

Python 如何在不指定绝对路径的情况下使用 PIL.ImageFont.truetype 加载字体文件？

Python 类型错误：不能pickle 生成器对象

相关推荐

最近更新

标签