Python 3.x - iloc 抛出错误 - “单个位置索引器越界”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37959214/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:09:58  来源:igfitidea点击:

Python 3.x - iloc throws error - "single positional indexer is out-of-bounds"

pythonindexingdataframeweb-scraping

提问by Rohan Bapat

I am scraping election data from a website and trying to store it in a dataframe

我正在从网站上抓取选举数据并尝试将其存储在数据框中

import pandas as pd
import bs4
import requests

columns = ['Candidate','Party','Criminal Cases','Education','Age','Total Assets','Liabilities']

df = pd.DataFrame(columns = columns)

ind=1

url = requests.get("http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341")
soup = bs4.BeautifulSoup(url.content)

for content in soup.findAll("td")[16:]:
    df.iloc[ind//7,ind%7-1] = content.text
    ind=ind+1
print(df)

Essentially, each iteration of content.text will provide me a value which I will populate in the table. The loop will populate values to df in the following sequence -

本质上,content.text 的每次迭代都会为我提供一个值,我将在表中填充该值。循环将按以下顺序将值填充到 df -

df[0,0]
df[0,1]
df[0,2]
.
.
.
df[1,0]
df[1,1]
.
.

and so on. Unfortunately the iloc is throwing an error - "single positional indexer is out-of-bounds". The funny part is when I try df.iloc[0,0] = content.textoutside the for loop (in a separate cell for testing purpose), the code works properly, but in the for loop it creates an error. I believe it might be something trivial but i am unable to understand.Please help

等等。不幸的是,iloc 抛出了一个错误——“单个位置索引器越界”。有趣的是,当我尝试df.iloc[0,0] = content.text在 for 循环之外(在单独的单元格中进行测试)时,代码正常工作,但在 for 循环中它会产生错误。我相信这可能是微不足道的,但我无法理解。请帮忙

回答by Ilja Everil?

DataFrame.iloccannot enlarge its target object. This used to be the error message, but has changed since version 0.15.

DataFrame.iloc无法放大其目标对象。这曾经是错误消息,但自 0.15 版以来已更改。

In general a DataFrameis not meant to be built row at a time. It is very inefficient. Instead you should create a more traditional data structure and populate a DataFramefrom it:

一般来说,aDataFrame并不意味着一次构建一行。这是非常低效的。相反,您应该创建一个更传统的数据结构并DataFrame从中填充 a :

table = soup.find(id='table1')
rows = table.find_all('tr')[1:]
data = [[cell.text for cell in row.find_all('td')] for row in rows]
df = pd.DataFrame(data=data, columns=columns)

From inspecting the page in your request it seems you were after the table with the id "table1", which has as the first row the header (a poor choice from the authors of that page, should've been in <thead>, not the body). So skip the first row ([1:]) and then build a list of lists from the cells of the rows.

从检查您的请求中的页面来看,您似乎是在 ID 为“table1”的表格之后,该表格的第一行是标题(该页面作者的一个糟糕选择,应该在<thead>,而不是正文中) . 所以跳过第一行 ( [1:]),然后从行的单元格构建一个列表列表。

Of course you could also just let pandas worry about parsing and all:

当然,您也可以让熊猫担心解析等等:

url = "http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341"
df = pd.read_html(url, header=0)[2]  # Pick the 3rd table in the page

回答by user3404344

THis is a workaround. I get the same iloc error with my pandas version. This modified code overcomes it by creating a blank record (by creating a 1-row dataframe and appending to existing one) each iteration before assigning values to them.

这是一种解决方法。我的 Pandas 版本也出现了相同的 iloc 错误。这个修改后的代码通过在每次迭代之前创建一个空白记录(通过创建一个 1 行数据帧并附加到现有数据帧)来克服它,然后再为它们分配值。

import pandas as pd
import bs4
import requests

columns = ['Candidate','Party','Criminal Cases','Education','Age','Total Assets','Liabilities']

df = pd.DataFrame(columns = columns)

ind=1
url = requests.get("http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341")
soup = bs4.BeautifulSoup(url.content)

for content in soup.findAll("td")[16:]:
    data = pd.DataFrame({columns[0]:"",
                     columns[1]:"",
                     columns[2]:"",
                     columns[3]:"",
                     columns[4]:"",
                     columns[5]:"",
                     columns[6]:"",
                    },index=[0])
    df=df.append(data,,ignore_index=True)
    df.iloc[ind//7,ind%7-1] = content.text
    ind=ind+1