在 Pandas 数据框中设置索引时出现 KeyError

Question

提问by Iwan

I'm getting a keyerror when trying to set the index of my dataframe. I've not encountered this before when setting the index in the same way, and am wondering what's going wrong? The data has no column headers, therefore the DataFrame headers are 0,1,2,4,5 etc. The error occurs on any column header.

尝试设置数据帧的索引时出现关键错误。我以前在以相同的方式设置索引时没有遇到过这种情况，我想知道出了什么问题？数据没有列标题，因此 DataFrame 标题是 0、1、2、4、5 等。错误发生在任何列标题上。

I receive KeyError: '0' when trying to use the first column (which I want to use as the only index).

我在尝试使用第一列（我想将其用作唯一索引）时收到 KeyError: '0' 。

For context:In the sample below, I'm selecting macro enabled excel spreadsheets, squeezing the data, reading and converting them into DataFrames.

对于上下文：在下面的示例中，我选择启用宏的 excel 电子表格，压缩数据，读取并将它们转换为 DataFrames。

I then want to include the filename in a column, set the index and strip whitespace so that I can use index labels to extract the data I need. Not every worksheet will have the index labels so I have the try and except to skip the worksheets which don't contain those labels in the index. I then want to concatenate each result into one DataFrame and squeeze unused columns.

然后我想将文件名包含在列中，设置索引并去除空格，以便我可以使用索引标签来提取我需要的数据。并非每个工作表都会有索引标签，所以我尝试跳过索引中不包含这些标签的工作表。然后我想将每个结果连接到一个 DataFrame 并压缩未使用的列。

import itertools
import glob
from openpyxl import load_workbook
from pandas import DataFrame
import pandas as pd
import os

def get_data(ws):
        for row in ws.values:
            row_it = iter(row)
            for cell in row_it:
                if cell is not None:
                    yield itertools.chain((cell,), row_it)
                    break

def read_workbook(file_):
        wb = load_workbook(file_, data_only=True)
        for sheet in wb.worksheets:
            ws = sheet
        return DataFrame(get_data(ws))

path =r'dir'
allFiles = glob.glob(path + "/*.xlsm")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
        parsed_file = read_workbook(file_)
        parsed_file['filename'] = os.path.basename(file_)
        parsed_file.set_index(['0'], inplace = True)
        parsed_file.index.str.strip()
    try: 
        parsed_file.loc["Staff" : "Total"].copy()
        list_.append(parsed_file)
    except KeyError:
        pass

frame = pd.concat(list_)
print(frame.dropna(axis='columns', thresh=2, inplace = True))

example dataframe, index position needed and labels to be extracted.

示例数据框、需要的索引位置和要提取的标签。

     index
     0          1   2 
0    5          2   4
1    RTJHD      5   9
2    ABCD       4   6
3    Staff      9   3 --- extract from here
4    FHDHSK     3   2
5    IRRJWK     7   1
6    FJDDCN     1   8
7    67         4   7
8    Total      5   3 --- to here

Error

错误

Traceback (most recent call last):

  File "<ipython-input-29-d8fd24ca84ec>", line 1, in <module>
    runfile('dir.py', wdir='C:/dir/Documents')

  File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "dir.py", line 36, in <module>
    parsed_file.set_index(['0'], inplace = True)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2830, in set_index
    level = frame[col]._values

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1964, in __getitem__
    return self._getitem_column(key)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1971, in _getitem_column
    return self._get_item_cache(key)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\generic.py", line 1645, in _get_item_cache
    values = self._data.get(item)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\internals.py", line 3590, in get
    loc = self.items.get_loc(item)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\indexes\base.py", line 2444, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)

  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)

  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)

  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)

KeyError: '0'

Answer 1

回答by cs95

You're receiving this error because your dataframe is read in without any headers. This implies your headers are of type Int64Index:

您收到此错误是因为您的数据帧是在没有任何标头的情况下读取的。这意味着您的标题类型为Int64Index：

Int64Index([0, 1, 2, 3, ...], dtype='int64')

At this point, I would recommend just accessing df.columnsby index, wherever you're forced to deal with them:

在这一点上，我建议只df.columns通过索引访问，无论你在哪里被迫处理它们：

parsed_file.set_index(parsed_file.columns[0], inplace = True)

Don't hardcode your column names, if you're accessing by index. The alternative to this would be to assign some of your very own column names, and thus reference those.

如果您通过索引访问，请不要对列名进行硬编码。对此的替代方法是分配一些您自己的列名称，从而引用这些名称。

在 Pandas 数据框中设置索引时出现 KeyError

提问by Iwan

回答by cs95

相关推荐

最近更新

标签

在 Pandas 数据框中设置索引时出现 KeyError

提问by Iwan

回答by cs95

相关推荐

pandas 修改熊猫图的日期刻度

pandas 在python pandas数据帧中将字符串转换为日期格式

pandas 迭代器中的返回值类型和熊猫中迭代器的列名打印

pandas 不明白：ValueError: Can only tuple-index with a MultiIndex

相关推荐

最近更新

标签