使用 python-pandas 索引数据框时,无法为非唯一标签绑定正确的切片
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37935294/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Cannot get right slice bound for non-unique label when indexing data frame with python-pandas
提问by user5779223
I have such a data frame df
:
我有这样一个数据框df
:
a b
10 2
3 1
0 0
0 4
....
# about 50,000+ rows
I wish to choose the df[:5, 'a']
. But When I call df.loc[:5, 'a']
, I got an error: KeyError: 'Cannot get right slice bound for non-unique label: 5
. When I call df.loc[5]
, the result contains 250 rows while there is just one when I use df.iloc[5]
. Why does this thing happen and how can I index it properly? Thank you in advance!
我希望选择df[:5, 'a']
. 但是当我打电话时df.loc[:5, 'a']
,我收到一个错误:KeyError: 'Cannot get right slice bound for non-unique label: 5
。当我调用时df.loc[5]
,结果包含 250 行,而当我使用df.iloc[5]
. 为什么会发生这种情况,我该如何正确索引它?先感谢您!
回答by Stefan
The error message is explained here: if the index is not monotonic, then both slice bounds must be unique members of the index
.
此处解释了错误消息:if the index is not monotonic, then both slice bounds must be unique members of the index
。
The difference between .loc
and .iloc
is label
vs integer position
based indexing - see docs. .loc
is intended to select individual labels
or slices
of labels. That's why .loc[5]
selects all rows where the index
has the value 250 (and the error is about a non-unique index). iloc
, in contrast, select row number 5 (0-indexed). That's why you only get a single row, and the index value may or may not be 5
. Hope this helps!
.loc
和之间的区别.iloc
是label
与integer position
基于索引的索引 -请参阅文档。.loc
旨在选择单个labels
或slices
标签。这就是为什么.loc[5]
选择index
值为 250 的所有行(并且错误与非唯一索引有关)。iloc
,相反,选择第 5 行(0 索引)。这就是为什么你只得到一行,而索引值可能是也可能不是5
。希望这可以帮助!
回答by Sujith Rao
The issue with the way you are addressing is that, there are multiple rows with index as 5. So the loc attribute does not know which one to pick. To know just do a df.loc[5] you will get number of rows with same index. Either you can sort it using sort_index or you can first aggregate data based on index and then retrieve. Hope this helps.
您处理方式的问题在于,有多行索引为 5。因此 loc 属性不知道选择哪一个。要知道只需执行 df.loc[5] 您将获得具有相同索引的行数。您可以使用 sort_index 对其进行排序,也可以首先根据索引聚合数据,然后进行检索。希望这可以帮助。