Pandas 索引和密钥错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51445631/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas indexing and Key error
提问by Yash
Consider the following:
考虑以下:
d = {'a': 0.0, 'b': 1.0, 'c': 2.0}
e = pd.Series(d, index = ['a', 'b', 'c'])
df = pd.DataFrame({ 'A' : 1.,'B' : e,'C' :pd.Timestamp('20130102')}).
When i try to access the first row of column B in the following way:
当我尝试通过以下方式访问 B 列的第一行时:
>>> df.B[0]
0.0
I get the correct result.
我得到正确的结果。
However, after reading KeyError: 0 when accessing value in pandas series, I was under the assumption that, since I have specified the index as 'a', 'b' and 'c', the correct way to access the first row of column B (using positional arguments) is:
df.B.iloc[0]
, and df.B[0]
should raise a Key Error. I dont know what am I missing. Can someone clarify in which case do I get a Key Error ?
但是,在阅读KeyError: 0 when accessing value in pandas series 之后,我假设,因为我已将索引指定为 'a'、'b' 和 'c',这是访问列第一行的正确方法B(使用位置参数)是:
df.B.iloc[0]
,并且df.B[0]
应该引发一个关键错误。我不知道我错过了什么。有人可以澄清在哪种情况下我会收到 Key Error 吗?
回答by Justinas Marozas
Problem in your referenced Question is that index of given dataframe is integer, but does not start from 0.
您引用的问题中的问题是给定数据帧的索引是整数,但不是从 0 开始。
Pandas behaviour when asking for df.B[0]
is ambiguous and depends on data type of index and data type of value passed to python slice syntax. It can behave like df.B.loc[0]
(index label based) or df.B.iloc[0]
(position based) or probably something else I'm not aware of. For predictable behaviour I recommend using loc
and iloc
.
请求时的 Pandas 行为df.B[0]
不明确,取决于传递给 python 切片语法的索引的数据类型和值的数据类型。它可以表现得像df.B.loc[0]
(基于索引标签)或df.B.iloc[0]
(基于位置)或者可能是我不知道的其他东西。对于可预测的行为,我建议使用loc
和iloc
。
To illustrate this with your example:
用你的例子来说明这一点:
d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = ['a', 'b', 'c'])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})
df.B[0] # 0.0 - fall back to position based
df.B['0'] # KeyError - no label '0' in index
df.B['a'] # 0.0 - found label 'a' in index
df.B.loc[0] # TypeError - string index queried by integer value
df.B.loc['0'] # KeyError - no label '0' in index
df.B.loc['a'] # 0.0 - found label 'a' in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position
df.B.iloc['a'] # TypeError - string can't be used for position
With example from referenced article:
以参考文章中的示例为例:
d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = [4, 5, 6])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})
df.B[0] # KeyError - label 0 not in index
df.B['0'] # KeyError - label '0' not in index
df.B.loc[0] # KeyError - label 0 not in index
df.B.loc['0'] # KeyError - label '0' not in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position
回答by xyzjayne
df.B
returns a pandas series which is why you can do positional indexing. If you select column B as a dataframe this will throw an error:
df.B
返回一个Pandas系列,这就是您可以进行位置索引的原因。如果您选择 B 列作为数据框,这将引发错误:
df[['B']][0]
回答by NiGiord
df.B
is actually a pandas.Series
object (a shortcut for df['B']
), which can be iterated. df.B[0]
is no longer a "row" but just the first element of df.B
, since by writing df.B
you basically create a 1-D object.
df.B
实际上是一个pandas.Series
对象( 的快捷方式df['B']
),可以迭代。df.B[0]
不再是“行”而只是 的第一个元素df.B
,因为通过编写df.B
您基本上创建了一个一维对象。
More information in the data structure documentation
数据结构文档中的更多信息
You can treat a DataFrame semantically like a dict of like-indexed Series objects.
您可以在语义上将 DataFrame 视为类似索引的 Series 对象的字典。