KeyError:不在索引中,使用从 Pandas 数据帧本身生成的键

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24160227/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:09:11  来源:igfitidea点击:

KeyError: Not in index, using a keys generated from a Pandas dataframe on itself

python-2.7pandaskeyerror

提问by Jason

I have two columns in a PandasDataFramethat has datetimeas its index. The two column contain data measuring the same parameter but neither column is complete (some row have no data at all, some rows have data in both column and other data on in column 'a' or 'b').

我在 aPandasDataFrame中有两列datetime作为其索引。两列包含测量相同参数的数据,但两列都不完整(有些行根本没有数据,有些行两列都有数据,而其他数据在列“a”或“b”中)。

I've written the following code to find gaps in columns, generate a list of indices of dates where these gaps appear and use this list to find and replace missing data. However I get a KeyError: Not in indexon line 3, which I don't understand because the keys I'm using to index came from the DataFrameitself. Could somebody explain why this is happening and what I can do to fix it? Here's the code:

我编写了以下代码来查找列中的间隙,生成出现这些间隙的日期索引列表,并使用此列表查找和替换丢失的数据。但是我KeyError: Not in index在第 3 行得到了一个,我不明白,因为我用来索引的键来自它DataFrame本身。有人可以解释为什么会发生这种情况以及我可以做些什么来解决它?这是代码:

def merge_func(df):
    null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
    df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
    notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
    df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']

    df.insert(len(df.columns), 'Mean_mg/L', 0.0)
    df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
    return df

merge_func(sve)

采纳答案by EdChum

Whenever you are considering performing assignment then you should use .loc:

每当您考虑执行分配时,您应该使用.loc

df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']

The error in your original code is the ordering of the subscript values for the index lookup:

原始代码中的错误是索引查找的下标值的顺序:

df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']

will produce an index error, I get the error on a toy dataset: IndexError: indices are out-of-bounds

会产生索引错误,我在玩具数据集上得到错误: IndexError: indices are out-of-bounds

If you changed the order to this it would probably work:

如果您将顺序更改为此,它可能会起作用:

df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index]

However, this is chained assignment and should be avoided, see the online docs

但是,这是链式分配,应该避免,请参阅在线文档

So you should use loc:

所以你应该使用loc

df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
df.loc[notnull_index, 'DOC_mg/L'] = df['TOC_mg/L']

note that it is not necessary to use the same index for the rhs as it will align correctly

请注意,没有必要为 rhs 使用相同的索引,因为它会正确对齐