使用 python-pandas 索引数据框时,无法为非唯一标签绑定正确的切片

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37935294/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:25:44  来源:igfitidea点击:

Cannot get right slice bound for non-unique label when indexing data frame with python-pandas

pythonpandasdataframe

提问by user5779223

I have such a data frame df:

我有这样一个数据框df

a         b
10        2
3         1
0         0
0         4
....
# about 50,000+ rows

I wish to choose the df[:5, 'a']. But When I call df.loc[:5, 'a'], I got an error: KeyError: 'Cannot get right slice bound for non-unique label: 5. When I call df.loc[5], the result contains 250 rows while there is just one when I use df.iloc[5]. Why does this thing happen and how can I index it properly? Thank you in advance!

我希望选择df[:5, 'a']. 但是当我打电话时df.loc[:5, 'a'],我收到一个错误:KeyError: 'Cannot get right slice bound for non-unique label: 5。当我调用时df.loc[5],结果包含 250 行,而当我使用df.iloc[5]. 为什么会发生这种情况,我该如何正确索引它?先感谢您!

回答by Stefan

The error message is explained here: if the index is not monotonic, then both slice bounds must be unique members of the index.

此处解释错误消息:if the index is not monotonic, then both slice bounds must be unique members of the index

The difference between .locand .ilocis labelvs integer positionbased indexing - see docs. .locis intended to select individual labelsor slicesof labels. That's why .loc[5]selects all rows where the indexhas the value 250 (and the error is about a non-unique index). iloc, in contrast, select row number 5 (0-indexed). That's why you only get a single row, and the index value may or may not be 5. Hope this helps!

.loc和之间的区别.iloclabelinteger position基于索引的索引 -请参阅文档.loc旨在选择单个labelsslices标签。这就是为什么.loc[5]选择index值为 250 的所有行(并且错误与非唯一索引有关)。iloc,相反,选择第 5 行(0 索引)。这就是为什么你只得到一行,而索引值可能是也可能不是5。希望这可以帮助!

回答by Sujith Rao

The issue with the way you are addressing is that, there are multiple rows with index as 5. So the loc attribute does not know which one to pick. To know just do a df.loc[5] you will get number of rows with same index. Either you can sort it using sort_index or you can first aggregate data based on index and then retrieve. Hope this helps.

您处理方式的问题在于,有多行索引为 5。因此 loc 属性不知道选择哪一个。要知道只需执行 df.loc[5] 您将获得具有相同索引的行数。您可以使用 sort_index 对其进行排序,也可以首先根据索引聚合数据,然后进行检索。希望这可以帮助。