pandas 如何删除熊猫数据框中的唯一行？

Question

提问by toto_tico

I am stuck with a seemingly easy problem: dropping unique rows in a pandas dataframe. Basically, the opposite of drop_duplicates().

我遇到了一个看似简单的问题：在 Pandas 数据框中删除唯一行。基本上是相反的drop_duplicates()。

Let's say this is my data:

假设这是我的数据：

    A       B   C  
0   foo     0   A
1   foo     1   A
2   foo     1   B
3   bar     1   A

I would like to drop the rows when A, and B are unique, i.e. I would like to keep only the rows 1 and 2.

当 A 和 B 是唯一的时，我想删除行，即我只想保留第 1 行和第 2 行。

I tried the following:

我尝试了以下方法：

# Load Dataframe
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})

uniques = df[['A', 'B']].drop_duplicates()
duplicates = df[~df.index.isin(uniques.index)]

But I only get the row 2, as 0, 1, and 3 are in the uniques!

但我只得到第 2 行，因为 0、1 和 3 是唯一的！

Answer 1

回答by jezrael

Solutions for select all duplicated rows:

选择所有重复行的解决方案：

You can use duplicatedwith subset and parameter keep=Falsefor select all duplicates:

您可以使用duplicated子集和参数keep=False来选择所有重复项：

df = df[df.duplicated(subset=['A','B'], keep=False)]
print (df)
     A  B  C
1  foo  1  A
2  foo  1  B

Solution with transform:

解决方案transform：

df = df[df.groupby(['A', 'B'])['A'].transform('size') > 1]
print (df)
     A  B  C
1  foo  1  A
2  foo  1  B

A bit modified solutions for select all unique rows:

选择所有唯一行的一些修改解决方案：

#invert boolean mask by ~
df = df[~df.duplicated(subset=['A','B'], keep=False)]
print (df)
     A  B  C
0  foo  0  A
3  bar  1  A

df = df[df.groupby(['A', 'B'])['A'].transform('size') == 1]
print (df)
     A  B  C
0  foo  0  A
3  bar  1  A

Answer 2

回答by toto_tico

I came up with a solution using groupby:

我想出了一个解决方案groupby：

groupped = df.groupby(['A', 'B']).size().reset_index().rename(columns={0: 'count'})
uniques = groupped[groupped['count'] == 1]
duplicates = df[~df.index.isin(uniques.index)]

Duplicates now has the proper result:

Duplicates 现在有正确的结果：

    A       B   C
2   foo     1   B
3   bar     1   A

Also, my original attempt in the question can be fixed by simply adding keep=Falsein the drop_duplicatesmethod:

另外，我在这个问题原来尝试可以固定通过简单地增加keep=False的drop_duplicates方法：

# Load Dataframe
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})

uniques = df[['A', 'B']].drop_duplicates(keep=False)
duplicates = df[~df.index.isin(uniques.index)]

Please @jezrael answer, I think it is safest(?), as I am using pandas indexes here.

请@jezrael 回答，我认为这是最安全的（？），因为我在这里使用了Pandas索引。

pandas 如何删除熊猫数据框中的唯一行？

提问by toto_tico

回答by jezrael

回答by toto_tico

相关推荐

最近更新

标签

pandas 如何删除熊猫数据框中的唯一行？

提问by toto_tico

回答by jezrael

回答by toto_tico

相关推荐

pandas 按一列分组并在熊猫中找到另一列的总和和最大值

在一个图中绘制来自多个 Pandas 数据框的数据

pandas 我在 groupby 上应用了 sum()，我想对最后一列的值进行排序

Pandas - 将大数据帧切成块

相关推荐

最近更新

标签