在 Pandas 数据框中查找所有重复的行

Question

提问by Nico

I would like to be able to get the indices of all the instances of a duplicated row in a dataset without knowing the name and number of columns beforehand. So assume I have this:

我希望能够获得数据集中重复行的所有实例的索引，而无需事先知道列的名称和数量。所以假设我有这个：

I'd like to be able to get [1, 3, 4]and [2, 5]. Is there any way to achieve this? It sounds really simple, but since I don't know the columns beforehand I can't do something like df[col == x...].

我希望能够获得[1, 3, 4]和[2, 5]。有没有办法实现这一目标？这听起来很简单，但因为我事先不知道列，所以我不能做类似df[col == x...].

Answer 1

回答by jezrael

First filter all duplicatedrows and then groupbywith applyor convert indexto_series:

首先过滤所有duplicated行，然后 groupby使用apply或转换indexto_series：

df = df[df.col.duplicated(keep=False)]

a = df.groupby('col').apply(lambda x: list(x.index))
print (a)
col
1    [1, 3, 4]
2       [2, 5]
dtype: object

a = df.index.to_series().groupby(df.col).apply(list)
print (a)
col
1    [1, 3, 4]
2       [2, 5]
dtype: object

And if need nested lists:

如果需要嵌套列表：

L = df.groupby('col').apply(lambda x: list(x.index)).tolist()
print (L)
[[1, 3, 4], [2, 5]]

If need use only first column is possible selected by position with iloc:

如果需要仅使用第一列，则可以按位置选择iloc：

a = df[df.iloc[:,0].duplicated(keep=False)]
      .groupby(df.iloc[:,0]).apply(lambda x: list(x.index))
print (a)
col
1    [1, 3, 4]
2       [2, 5]
dtype: object

在 Pandas 数据框中查找所有重复的行

提问by Nico

回答by jezrael

相关推荐

最近更新

标签

在 Pandas 数据框中查找所有重复的行

提问by Nico

回答by jezrael

相关推荐

Pandas：如何将函数应用于列名

Pandas 映射到 TRUE/FALSE 作为字符串，而不是布尔值

pandas 将熊猫系列时间戳转换为唯一日期列表

在 Pandas 数据框中查找包含 inf 的单元格的行位置和列名

相关推荐

最近更新

标签