Python Pandas KeyError:值不在索引中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38462920/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas KeyError: value not in index
提问by xpt
I have the following code,
我有以下代码,
df = pd.read_csv(CsvFileName)
p = df.pivot_table(index=['Hour'], columns='DOW', values='Changes', aggfunc=np.mean).round(0)
p.fillna(0, inplace=True)
p[["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"]] = p[["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"]].astype(int)
It has always been working until the csv file doesn't have enough coverage (of all week days). For e.g., with the following .csv file,
它一直在工作,直到 csv 文件没有足够的覆盖范围(所有工作日)。例如,使用以下 .csv 文件,
DOW,Hour,Changes
4Wed,01,237
3Tue,07,2533
1Sun,01,240
3Tue,12,4407
1Sun,09,2204
1Sun,01,240
1Sun,01,241
1Sun,01,241
3Tue,11,662
4Wed,01,4
2Mon,18,4737
1Sun,15,240
2Mon,02,4
6Fri,01,1
1Sun,01,240
2Mon,19,2300
2Mon,19,2532
I'll get the following error:
我会收到以下错误:
KeyError: "['5Thu' '7Sat'] not in index"
It seems to have a very easy fix, but I'm just too new to Python to know how to fix it.
它似乎有一个非常简单的修复方法,但我对 Python 太陌生,不知道如何修复它。
回答by piRSquared
Use reindex
to get all columns you need. It'll preserve the ones that are already there and put in empty columns otherwise.
使用reindex
以获得您所需要的所有列。它将保留已经存在的那些,否则放在空列中。
p = p.reindex(columns=['1Sun', '2Mon', '3Tue', '4Wed', '5Thu', '6Fri', '7Sat'])
So, your entire code example should look like this:
因此,您的整个代码示例应如下所示:
df = pd.read_csv(CsvFileName)
p = df.pivot_table(index=['Hour'], columns='DOW', values='Changes', aggfunc=np.mean).round(0)
p.fillna(0, inplace=True)
columns = ["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"]
p = p.reindex(columns=columns)
p[columns] = p[columns].astype(int)
回答by ILikeWhiskey
I had a very similar issue. I got the same error because the csv contained spaces in the header. My csv contained a header "Gender " and I had it listed as:
我有一个非常相似的问题。我遇到了同样的错误,因为 csv 在标题中包含空格。我的 csv 包含一个标题“性别”,我把它列为:
[['Gender']]
If it's easy enough for you to access your csv, you can use the excel formula trim()
to clip any spaces of the cells.
如果您可以轻松访问 csv,则可以使用 excel 公式trim()
来剪切单元格的任何空格。
or remove it like this
或像这样删除它
df.columns = df.columns.to_series().apply(lambda x: x.strip())
df.columns = df.columns.to_series().apply(lambda x: x.strip())
回答by Paul Gheno
I had the same issue.
我遇到过同样的问题。
During the 1st development I used a .csv file (comma as separator) that I've modified a bit before saving it. After saving the commas became semicolon.
在第一次开发期间,我使用了一个 .csv 文件(逗号作为分隔符),我在保存之前对其进行了一些修改。保存后逗号变成了分号。
On Windows it is dependent on the "Regional and Language Options" customize screen where you find a List separator. This is the char Windows applications expect to be the CSV separator.
在 Windows 上,它取决于“区域和语言选项”自定义屏幕,您可以在其中找到列表分隔符。这是 Windows 应用程序期望作为 CSV 分隔符的字符。
When testing from a brand new file I encountered that issue.
从一个全新的文件进行测试时,我遇到了这个问题。
I've removed the 'sep' argument in read_csv method before:
我之前在 read_csv 方法中删除了 'sep' 参数:
df1 = pd.read_csv('myfile.csv', sep=',');
after:
后:
df1 = pd.read_csv('myfile.csv');
That way, the issue disappeared.
这样,问题就消失了。
回答by Edwin Paul
please try this to clean and format your column names:
请尝试使用此方法来清理和格式化您的列名:
df.columns = (df.columns.str.strip().str.upper()
.str.replace(' ', '_')
.str.replace('(', '')
.str.replace(')', ''))