Pandas 索引和密钥错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51445631/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:49:24  来源:igfitidea点击:

Pandas indexing and Key error

pythonpandasindexing

提问by Yash

Consider the following:

考虑以下:

d = {'a': 0.0, 'b': 1.0, 'c': 2.0}

e = pd.Series(d, index = ['a', 'b', 'c'])

df = pd.DataFrame({ 'A' : 1.,'B' : e,'C' :pd.Timestamp('20130102')}).

When i try to access the first row of column B in the following way:

当我尝试通过以下方式访问 B 列的第一行时:

>>> df.B[0]
0.0

I get the correct result.

我得到正确的结果。

However, after reading KeyError: 0 when accessing value in pandas series, I was under the assumption that, since I have specified the index as 'a', 'b' and 'c', the correct way to access the first row of column B (using positional arguments) is: df.B.iloc[0], and df.B[0]should raise a Key Error. I dont know what am I missing. Can someone clarify in which case do I get a Key Error ?

但是,在阅读KeyError: 0 when accessing value in pandas series 之后,我假设,因为我已将索引指定为 'a'、'b' 和 'c',这是访问列第一行的正确方法B(使用位置参数)是: df.B.iloc[0],并且df.B[0]应该引发一个关键错误。我不知道我错过了什么。有人可以澄清在哪种情况下我会收到 Key Error 吗?

回答by Justinas Marozas

Problem in your referenced Question is that index of given dataframe is integer, but does not start from 0.

您引用的问题中的问题是给定数据帧的索引是整数,但不是从 0 开始。

Pandas behaviour when asking for df.B[0]is ambiguous and depends on data type of index and data type of value passed to python slice syntax. It can behave like df.B.loc[0](index label based) or df.B.iloc[0](position based) or probably something else I'm not aware of. For predictable behaviour I recommend using locand iloc.

请求时的 Pandas 行为df.B[0]不明确,取决于传递给 python 切片语法的索引的数据类型和值的数据类型。它可以表现得像df.B.loc[0](基于索引标签)或df.B.iloc[0](基于位置)或者可能是我不知道的其他东西。对于可预测的行为,我建议使用lociloc

To illustrate this with your example:

用你的例子来说明这一点:

d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = ['a', 'b', 'c'])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})

df.B[0] # 0.0 - fall back to position based
df.B['0'] # KeyError - no label '0' in index
df.B['a'] # 0.0 - found label 'a' in index
df.B.loc[0] # TypeError - string index queried by integer value
df.B.loc['0'] # KeyError - no label '0' in index
df.B.loc['a'] # 0.0 - found label 'a' in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position
df.B.iloc['a'] # TypeError - string can't be used for position

With example from referenced article:

以参考文章中的示例为例:

d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = [4, 5, 6])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})

df.B[0] # KeyError - label 0 not in index
df.B['0'] # KeyError - label '0' not in index
df.B.loc[0] # KeyError - label 0 not in index
df.B.loc['0'] # KeyError - label '0' not in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position

回答by xyzjayne

df.Breturns a pandas series which is why you can do positional indexing. If you select column B as a dataframe this will throw an error:

df.B返回一个Pandas系列,这就是您可以进行位置索引的原因。如果您选择 B 列作为数据框,这将引发错误:

df[['B']][0]

回答by NiGiord

df.Bis actually a pandas.Seriesobject (a shortcut for df['B']), which can be iterated. df.B[0]is no longer a "row" but just the first element of df.B, since by writing df.Byou basically create a 1-D object.

df.B实际上是一个pandas.Series对象( 的快捷方式df['B']),可以迭代。df.B[0]不再是“行”而只是 的第一个元素df.B,因为通过编写df.B您基本上创建了一个一维对象。

More information in the data structure documentation

数据结构文档中的更多信息

You can treat a DataFrame semantically like a dict of like-indexed Series objects.

您可以在语义上将 DataFrame 视为类似索引的 Series 对象的字典。