Python 3.x - iloc 抛出错误 - “单个位置索引器越界”

Question

提问by Rohan Bapat

I am scraping election data from a website and trying to store it in a dataframe

我正在从网站上抓取选举数据并尝试将其存储在数据框中

import pandas as pd
import bs4
import requests

columns = ['Candidate','Party','Criminal Cases','Education','Age','Total Assets','Liabilities']

df = pd.DataFrame(columns = columns)

ind=1

url = requests.get("http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341")
soup = bs4.BeautifulSoup(url.content)

for content in soup.findAll("td")[16:]:
    df.iloc[ind//7,ind%7-1] = content.text
    ind=ind+1
print(df)

Essentially, each iteration of content.text will provide me a value which I will populate in the table. The loop will populate values to df in the following sequence -

本质上，content.text 的每次迭代都会为我提供一个值，我将在表中填充该值。循环将按以下顺序将值填充到 df -

df[0,0]
df[0,1]
df[0,2]
.
.
.
df[1,0]
df[1,1]
.
.

and so on. Unfortunately the iloc is throwing an error - "single positional indexer is out-of-bounds". The funny part is when I try df.iloc[0,0] = content.textoutside the for loop (in a separate cell for testing purpose), the code works properly, but in the for loop it creates an error. I believe it might be something trivial but i am unable to understand.Please help

等等。不幸的是，iloc 抛出了一个错误——“单个位置索引器越界”。有趣的是，当我尝试df.iloc[0,0] = content.text在 for 循环之外（在单独的单元格中进行测试）时，代码正常工作，但在 for 循环中它会产生错误。我相信这可能是微不足道的，但我无法理解。请帮忙

Answer 1

回答by Ilja Everil?

DataFrame.iloccannot enlarge its target object. This used to be the error message, but has changed since version 0.15.

DataFrame.iloc无法放大其目标对象。这曾经是错误消息，但自 0.15 版以来已更改。

In general a DataFrameis not meant to be built row at a time. It is very inefficient. Instead you should create a more traditional data structure and populate a DataFramefrom it:

一般来说，aDataFrame并不意味着一次构建一行。这是非常低效的。相反，您应该创建一个更传统的数据结构并DataFrame从中填充 a ：

table = soup.find(id='table1')
rows = table.find_all('tr')[1:]
data = [[cell.text for cell in row.find_all('td')] for row in rows]
df = pd.DataFrame(data=data, columns=columns)

From inspecting the page in your request it seems you were after the table with the id "table1", which has as the first row the header (a poor choice from the authors of that page, should've been in <thead>, not the body). So skip the first row ([1:]) and then build a list of lists from the cells of the rows.

从检查您的请求中的页面来看，您似乎是在 ID 为“table1”的表格之后，该表格的第一行是标题（该页面作者的一个糟糕选择，应该在<thead>，而不是正文中） . 所以跳过第一行 ( [1:])，然后从行的单元格构建一个列表列表。

Of course you could also just let pandas worry about parsing and all:

当然，您也可以让熊猫担心解析等等：

url = "http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341"
df = pd.read_html(url, header=0)[2]  # Pick the 3rd table in the page

Answer 2

回答by user3404344

THis is a workaround. I get the same iloc error with my pandas version. This modified code overcomes it by creating a blank record (by creating a 1-row dataframe and appending to existing one) each iteration before assigning values to them.

这是一种解决方法。我的 Pandas 版本也出现了相同的 iloc 错误。这个修改后的代码通过在每次迭代之前创建一个空白记录（通过创建一个 1 行数据帧并附加到现有数据帧）来克服它，然后再为它们分配值。

import pandas as pd
import bs4
import requests

columns = ['Candidate','Party','Criminal Cases','Education','Age','Total Assets','Liabilities']

df = pd.DataFrame(columns = columns)

ind=1
url = requests.get("http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341")
soup = bs4.BeautifulSoup(url.content)

for content in soup.findAll("td")[16:]:
    data = pd.DataFrame({columns[0]:"",
                     columns[1]:"",
                     columns[2]:"",
                     columns[3]:"",
                     columns[4]:"",
                     columns[5]:"",
                     columns[6]:"",
                    },index=[0])
    df=df.append(data,,ignore_index=True)
    df.iloc[ind//7,ind%7-1] = content.text
    ind=ind+1

Python 3.x - iloc 抛出错误 - “单个位置索引器越界”

提问by Rohan Bapat

回答by Ilja Everil?

回答by user3404344

相关推荐

最近更新

标签

Python 3.x - iloc 抛出错误 - “单个位置索引器越界”

提问by Rohan Bapat

回答by Ilja Everil?

回答by user3404344

相关推荐

如何在 Python/Django 中将字典列表转换为 JSON？

重新安装操作系统后，使用 virtualenv 在 PyCharm 项目中“无法设置 Python SDK”

在 Windows 中使用 python 和 Anaconda

Python 使用 Pickle 保存 Numpy 数组

相关推荐

最近更新

标签