pandas 如何在 DataFrame 中找到重复的索引?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28014491/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:51:18  来源:igfitidea点击:

How do I find duplicate indices in a DataFrame?

pythonpandasdataframemulti-index

提问by Pat Patterson

I have a pandas DataFrame with a multi-level index ("instance" and "index"). I want to find all the first-level ("instance") index values which are non-unique and to print out those values.

我有一个带有多级索引(“实例”和“索引”)的 Pandas DataFrame。我想找到所有非唯一的第一级(“实例”)索引值并打印出这些值。

My frame looks like this:

我的框架看起来像这样:

                     A
instance  index      
      a       1      10
              2      12
              3      4
      b       1      12
              2      5
              3      2 
      b       1      12
              2      5
              3      2

I want to find "b" as the duplicate 0-level index and print its value ("b") out.

我想找到“b”作为重复的 0 级索引并将其值(“b”)打印出来。

回答by Alex Riley

You can use the get_duplicates()method:

您可以使用以下get_duplicates()方法:

>>> df.index.get_level_values('instance').get_duplicates()
[0, 1]

(In my example data 0and 1both appear multiple times.)

(在我的示例数据中01两者都出现多次。)

The get_level_values()method can accept a label (such as 'instance') or an integer and retrieves the relevant part of the MultiIndex.

get_level_values()方法可以接受标签(例如“实例”)或整数并检索 MultiIndex 的相关部分。

回答by Primer

Assuming that your dfhas an index made of 'instance'and 'index'you could do this:

假设你df有一个索引'instance''index'你可以这样做:

df1 = df.reset_index().pivot_table(index=['instance','index'], values='A', aggfunc='count')
df1[df1 > 1].index.get_level_values(0).drop_duplicates()

Which yields:

其中产生:

Index([u'b'], dtype='object')

Adding .valuesat the end (.drop_duplicates().values) will make an array:

.values在末尾添加( .drop_duplicates().values) 将创建一个数组:

array(['b'], dtype=object)

Or the same with one line using .groupby:

或者与一行相同,使用.groupby

df[df.groupby(level=['instance','index']).count() > 1].dropna().index.get_level_values(0).drop_duplicates()

回答by ejames

This should give you the whole row which isn't quite what you asked for but might be close enough:

这应该给你整行,这不是你所要求的,但可能足够接近:

df[df.index.get_level_values('instance').duplicated()]

df[df.index.get_level_values('instance').duplicated()]

回答by MTrenfield

You want the duplicated method:

你想要重复的方法:

df['Instance'].duplicated()