pandas 如何在 DataFrame 中找到重复的索引？

Question

提问by Pat Patterson

I have a pandas DataFrame with a multi-level index ("instance" and "index"). I want to find all the first-level ("instance") index values which are non-unique and to print out those values.

我有一个带有多级索引（“实例”和“索引”）的 Pandas DataFrame。我想找到所有非唯一的第一级（“实例”）索引值并打印出这些值。

My frame looks like this:

我的框架看起来像这样：

                     A
instance  index      
      a       1      10
              2      12
              3      4
      b       1      12
              2      5
              3      2 
      b       1      12
              2      5
              3      2

I want to find "b" as the duplicate 0-level index and print its value ("b") out.

我想找到“b”作为重复的 0 级索引并将其值（“b”）打印出来。

Answer 1

回答by Alex Riley

You can use the get_duplicates()method:

您可以使用以下get_duplicates()方法：

>>> df.index.get_level_values('instance').get_duplicates()
[0, 1]

(In my example data 0and 1both appear multiple times.)

（在我的示例数据中0，1两者都出现多次。）

The get_level_values()method can accept a label (such as 'instance') or an integer and retrieves the relevant part of the MultiIndex.

该get_level_values()方法可以接受标签（例如“实例”）或整数并检索 MultiIndex 的相关部分。

Answer 2

回答by Primer

Assuming that your dfhas an index made of 'instance'and 'index'you could do this:

假设你df有一个索引'instance'，'index'你可以这样做：

df1 = df.reset_index().pivot_table(index=['instance','index'], values='A', aggfunc='count')
df1[df1 > 1].index.get_level_values(0).drop_duplicates()

Which yields:

其中产生：

Index([u'b'], dtype='object')

Adding .valuesat the end (.drop_duplicates().values) will make an array:

.values在末尾添加( .drop_duplicates().values) 将创建一个数组：

array(['b'], dtype=object)

Or the same with one line using .groupby:

或者与一行相同，使用.groupby：

df[df.groupby(level=['instance','index']).count() > 1].dropna().index.get_level_values(0).drop_duplicates()

Answer 3

回答by ejames

This should give you the whole row which isn't quite what you asked for but might be close enough:

这应该给你整行，这不是你所要求的，但可能足够接近：

df[df.index.get_level_values('instance').duplicated()]

Answer 4

回答by MTrenfield

You want the duplicated method:

你想要重复的方法：

df['Instance'].duplicated()

pandas 如何在 DataFrame 中找到重复的索引？

提问by Pat Patterson

回答by Alex Riley

回答by Primer

回答by ejames

回答by MTrenfield

相关推荐

最近更新

标签

pandas 如何在 DataFrame 中找到重复的索引？

提问by Pat Patterson

回答by Alex Riley

回答by Primer

回答by ejames

回答by MTrenfield

相关推荐

sql 通过在 python pandas 中具有 count(1) > 1 等价物来选择组？

pandas 在熊猫中设置最大字符串长度

pandas 你能阻止 df.append() 的自动字母顺序吗？

pandas 如何使用pandas读取目录中所有文件的内容？

相关推荐

最近更新

标签