KeyError：不在索引中，使用从 Pandas 数据帧本身生成的键

Question

提问by Jason

I have two columns in a PandasDataFramethat has datetimeas its index. The two column contain data measuring the same parameter but neither column is complete (some row have no data at all, some rows have data in both column and other data on in column 'a' or 'b').

我在 aPandasDataFrame中有两列datetime作为其索引。两列包含测量相同参数的数据，但两列都不完整（有些行根本没有数据，有些行两列都有数据，而其他数据在列“a”或“b”中）。

I've written the following code to find gaps in columns, generate a list of indices of dates where these gaps appear and use this list to find and replace missing data. However I get a KeyError: Not in indexon line 3, which I don't understand because the keys I'm using to index came from the DataFrameitself. Could somebody explain why this is happening and what I can do to fix it? Here's the code:

我编写了以下代码来查找列中的间隙，生成出现这些间隙的日期索引列表，并使用此列表查找和替换丢失的数据。但是我KeyError: Not in index在第 3 行得到了一个，我不明白，因为我用来索引的键来自它DataFrame本身。有人可以解释为什么会发生这种情况以及我可以做些什么来解决它？这是代码：

def merge_func(df):
    null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
    df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
    notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
    df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']

    df.insert(len(df.columns), 'Mean_mg/L', 0.0)
    df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
    return df

merge_func(sve)

Answer 1

采纳答案by EdChum

Whenever you are considering performing assignment then you should use .loc:

每当您考虑执行分配时，您应该使用.loc：

df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']

The error in your original code is the ordering of the subscript values for the index lookup:

原始代码中的错误是索引查找的下标值的顺序：

df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']

will produce an index error, I get the error on a toy dataset: IndexError: indices are out-of-bounds

会产生索引错误，我在玩具数据集上得到错误： IndexError: indices are out-of-bounds

If you changed the order to this it would probably work:

如果您将顺序更改为此，它可能会起作用：

df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index]

However, this is chained assignment and should be avoided, see the online docs

但是，这是链式分配，应该避免，请参阅在线文档

So you should use loc:

所以你应该使用loc：

df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
df.loc[notnull_index, 'DOC_mg/L'] = df['TOC_mg/L']

note that it is not necessary to use the same index for the rhs as it will align correctly

请注意，没有必要为 rhs 使用相同的索引，因为它会正确对齐

KeyError：不在索引中，使用从 Pandas 数据帧本身生成的键

提问by Jason

采纳答案by EdChum

相关推荐

最近更新

标签

KeyError：不在索引中，使用从 Pandas 数据帧本身生成的键

提问by Jason

采纳答案by EdChum

相关推荐

Pandas `isin` 函数的更快替代方案

pandas 酸洗数据帧

Python Pandas MemoryError

pandas 如何在熊猫时间序列中基于 5 分钟的间隔创建组 ID？

相关推荐

最近更新

标签