pandas 如何在 DataFrame 中找到重复的索引?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28014491/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I find duplicate indices in a DataFrame?
提问by Pat Patterson
I have a pandas DataFrame with a multi-level index ("instance" and "index"). I want to find all the first-level ("instance") index values which are non-unique and to print out those values.
我有一个带有多级索引(“实例”和“索引”)的 Pandas DataFrame。我想找到所有非唯一的第一级(“实例”)索引值并打印出这些值。
My frame looks like this:
我的框架看起来像这样:
A
instance index
a 1 10
2 12
3 4
b 1 12
2 5
3 2
b 1 12
2 5
3 2
I want to find "b" as the duplicate 0-level index and print its value ("b") out.
我想找到“b”作为重复的 0 级索引并将其值(“b”)打印出来。
回答by Alex Riley
You can use the get_duplicates()method:
您可以使用以下get_duplicates()方法:
>>> df.index.get_level_values('instance').get_duplicates()
[0, 1]
(In my example data 0and 1both appear multiple times.)
(在我的示例数据中0,1两者都出现多次。)
The get_level_values()method can accept a label (such as 'instance') or an integer and retrieves the relevant part of the MultiIndex.
该get_level_values()方法可以接受标签(例如“实例”)或整数并检索 MultiIndex 的相关部分。
回答by Primer
Assuming that your dfhas an index made of 'instance'and 'index'you could do this:
假设你df有一个索引'instance','index'你可以这样做:
df1 = df.reset_index().pivot_table(index=['instance','index'], values='A', aggfunc='count')
df1[df1 > 1].index.get_level_values(0).drop_duplicates()
Which yields:
其中产生:
Index([u'b'], dtype='object')
Adding .valuesat the end (.drop_duplicates().values) will make an array:
.values在末尾添加( .drop_duplicates().values) 将创建一个数组:
array(['b'], dtype=object)
Or the same with one line using .groupby:
或者与一行相同,使用.groupby:
df[df.groupby(level=['instance','index']).count() > 1].dropna().index.get_level_values(0).drop_duplicates()
回答by ejames
This should give you the whole row which isn't quite what you asked for but might be close enough:
这应该给你整行,这不是你所要求的,但可能足够接近:
df[df.index.get_level_values('instance').duplicated()]
df[df.index.get_level_values('instance').duplicated()]
回答by MTrenfield
You want the duplicated method:
你想要重复的方法:
df['Instance'].duplicated()

