Python Pandas KeyError：值不在索引中

Question

提问by xpt

I have the following code,

我有以下代码，

df = pd.read_csv(CsvFileName)

p = df.pivot_table(index=['Hour'], columns='DOW', values='Changes', aggfunc=np.mean).round(0)
p.fillna(0, inplace=True)

p[["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"]] = p[["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"]].astype(int)

It has always been working until the csv file doesn't have enough coverage (of all week days). For e.g., with the following .csv file,

它一直在工作，直到 csv 文件没有足够的覆盖范围（所有工作日）。例如，使用以下 .csv 文件，

DOW,Hour,Changes
4Wed,01,237
3Tue,07,2533
1Sun,01,240
3Tue,12,4407
1Sun,09,2204
1Sun,01,240
1Sun,01,241
1Sun,01,241
3Tue,11,662
4Wed,01,4
2Mon,18,4737
1Sun,15,240
2Mon,02,4
6Fri,01,1
1Sun,01,240
2Mon,19,2300
2Mon,19,2532

I'll get the following error:

我会收到以下错误：

KeyError: "['5Thu' '7Sat'] not in index"

It seems to have a very easy fix, but I'm just too new to Python to know how to fix it.

它似乎有一个非常简单的修复方法，但我对 Python 太陌生，不知道如何修复它。

Answer 1

回答by piRSquared

Use reindexto get all columns you need. It'll preserve the ones that are already there and put in empty columns otherwise.

使用reindex以获得您所需要的所有列。它将保留已经存在的那些，否则放在空列中。

p = p.reindex(columns=['1Sun', '2Mon', '3Tue', '4Wed', '5Thu', '6Fri', '7Sat'])

So, your entire code example should look like this:

因此，您的整个代码示例应如下所示：

df = pd.read_csv(CsvFileName)

p = df.pivot_table(index=['Hour'], columns='DOW', values='Changes', aggfunc=np.mean).round(0)
p.fillna(0, inplace=True)

columns = ["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"]
p = p.reindex(columns=columns)
p[columns] = p[columns].astype(int)

Answer 2

回答by ILikeWhiskey

I had a very similar issue. I got the same error because the csv contained spaces in the header. My csv contained a header "Gender " and I had it listed as:

我有一个非常相似的问题。我遇到了同样的错误，因为 csv 在标题中包含空格。我的 csv 包含一个标题“性别”，我把它列为：

[['Gender']]

If it's easy enough for you to access your csv, you can use the excel formula trim()to clip any spaces of the cells.

如果您可以轻松访问 csv，则可以使用 excel 公式trim()来剪切单元格的任何空格。

or remove it like this

或像这样删除它

df.columns = df.columns.to_series().apply(lambda x: x.strip())

Answer 3

回答by Paul Gheno

I had the same issue.

我遇到过同样的问题。

During the 1st development I used a .csv file (comma as separator) that I've modified a bit before saving it. After saving the commas became semicolon.

在第一次开发期间，我使用了一个 .csv 文件（逗号作为分隔符），我在保存之前对其进行了一些修改。保存后逗号变成了分号。

On Windows it is dependent on the "Regional and Language Options" customize screen where you find a List separator. This is the char Windows applications expect to be the CSV separator.

在 Windows 上，它取决于“区域和语言选项”自定义屏幕，您可以在其中找到列表分隔符。这是 Windows 应用程序期望作为 CSV 分隔符的字符。

When testing from a brand new file I encountered that issue.

从一个全新的文件进行测试时，我遇到了这个问题。

I've removed the 'sep' argument in read_csv method before:

我之前在 read_csv 方法中删除了 'sep' 参数：

df1 = pd.read_csv('myfile.csv', sep=',');

after:

后：

df1 = pd.read_csv('myfile.csv');

That way, the issue disappeared.

这样，问题就消失了。

Answer 4

回答by Edwin Paul

please try this to clean and format your column names:

请尝试使用此方法来清理和格式化您的列名：

df.columns = (df.columns.str.strip().str.upper()
              .str.replace(' ', '_')
              .str.replace('(', '')
              .str.replace(')', ''))

Python Pandas KeyError：值不在索引中

提问by xpt

回答by piRSquared

回答by ILikeWhiskey

回答by Paul Gheno

回答by Edwin Paul

相关推荐

最近更新

标签

Python Pandas KeyError：值不在索引中

提问by xpt

回答by piRSquared

回答by ILikeWhiskey

回答by Paul Gheno

回答by Edwin Paul

相关推荐

属性错误：模块“时间”在 Python 3.8 中没有属性“时钟”

Python pandas read_json：“如果使用所有标量值，则必须传递索引”

Python 使用任何 hg mercurial 命令时出现“错误：root：未找到哈希 md5 的代码”

Python 如何将 Vector 拆分成列 - 使用 PySpark

相关推荐

最近更新

标签