KeyError:不在索引中,使用从 Pandas 数据帧本身生成的键
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24160227/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
KeyError: Not in index, using a keys generated from a Pandas dataframe on itself
提问by Jason
I have two columns in a PandasDataFramethat has datetimeas its index. The two column contain data measuring the same parameter but neither column is complete (some row have no data at all, some rows have data in both column and other data on in column 'a' or 'b').
我在 aPandasDataFrame中有两列datetime作为其索引。两列包含测量相同参数的数据,但两列都不完整(有些行根本没有数据,有些行两列都有数据,而其他数据在列“a”或“b”中)。
I've written the following code to find gaps in columns, generate a list of indices of dates where these gaps appear and use this list to find and replace missing data. However I get a KeyError: Not in indexon line 3, which I don't understand because the keys I'm using to index came from the DataFrameitself. Could somebody explain why this is happening and what I can do to fix it? Here's the code:
我编写了以下代码来查找列中的间隙,生成出现这些间隙的日期索引列表,并使用此列表查找和替换丢失的数据。但是我KeyError: Not in index在第 3 行得到了一个,我不明白,因为我用来索引的键来自它DataFrame本身。有人可以解释为什么会发生这种情况以及我可以做些什么来解决它?这是代码:
def merge_func(df):
null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']
df.insert(len(df.columns), 'Mean_mg/L', 0.0)
df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
return df
merge_func(sve)
采纳答案by EdChum
Whenever you are considering performing assignment then you should use .loc:
每当您考虑执行分配时,您应该使用.loc:
df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
The error in your original code is the ordering of the subscript values for the index lookup:
原始代码中的错误是索引查找的下标值的顺序:
df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
will produce an index error, I get the error on a toy dataset: IndexError: indices are out-of-bounds
会产生索引错误,我在玩具数据集上得到错误: IndexError: indices are out-of-bounds
If you changed the order to this it would probably work:
如果您将顺序更改为此,它可能会起作用:
df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index]
However, this is chained assignment and should be avoided, see the online docs
但是,这是链式分配,应该避免,请参阅在线文档
So you should use loc:
所以你应该使用loc:
df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
df.loc[notnull_index, 'DOC_mg/L'] = df['TOC_mg/L']
note that it is not necessary to use the same index for the rhs as it will align correctly
请注意,没有必要为 rhs 使用相同的索引,因为它会正确对齐

