从 Pandas 数据框中删除重复的行，其中只有某些列具有相同的值

Question

提问by beta

I have a pandas dataframe as follows:

我有一个Pandas数据框，如下所示：

I want that only 1 row remains of rows that share the same values in specific columns. In the example above I mean columns Aand B. In other words, if the values of columns Aand Boccur more than once in the dataframe, only one row should remain (which one does not matter).

我希望只剩下 1 行在特定列中共享相同值的行。在上面的例子中，我的意思是A和B列。换句话说，如果A列和B列的值在数据框中出现多次，则只应保留一行（哪一行无关紧要）。

FWIW: the maximum number of so called duplicate rows (that is, where column Aand Bare the same) is 2.

FWIW：所谓的重复行（即A列和B列相同）的最大数量为 2。

The result should looke like this:

结果应该是这样的：

or

或者

Answer 1

回答by jezrael

Use drop_duplicateswith parameter subset, for keeping only last duplicated rows add keep='last':

drop_duplicates与参数一起使用，subset仅保留最后重复的行添加keep='last'：

df1 = df.drop_duplicates(subset=['A','B'])
#same as
#df1 = df.drop_duplicates(subset=['A','B'], keep='first')
print (df1)
   A  B  C
0  1  2  x
2  3  4  z
3  3  5  x

df2 = df.drop_duplicates(subset=['A','B'], keep='last')
print (df2)
   A  B  C
1  1  2  y
2  3  4  z
3  3  5  x

从 Pandas 数据框中删除重复的行，其中只有某些列具有相同的值

提问by beta

回答by jezrael

相关推荐

最近更新

标签

从 Pandas 数据框中删除重复的行，其中只有某些列具有相同的值

提问by beta

回答by jezrael

相关推荐

在 Pandas 中使用列表替换列名

具有特定列聚合功能的 Pandas df.resample

Pandas 中日期列的最大值/最小值，列包含 nan 值

如何修复 AttributeError: 'DataFrame' 对象没有属性 'assign' 而不更新 Pandas？

相关推荐

最近更新

标签