pandas 按多列分组以查找重复行熊猫

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46640945/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:36:18  来源:igfitidea点击:

Grouping by multiple columns to find duplicate rows pandas

pythonpandas

提问by Shubham R

i have a df

我有一个 df

id    val1    val2
 1     1.1      2.2
 1     1.1      2.2
 2     2.1      5.5
 3     8.8      6.2
 4     1.1      2.2
 5     8.8      6.2

I want to group by val1 and val2and get similar dataframe only with rows which has multiple occurance of same val1 and val2combination.

我想val1 and val2仅对具有相同val1 and val2组合多次出现的行进行分组并获得类似的数据帧。

Final df:

最终 df:

id    val1    val2
 1     1.1      2.2
 4     1.1      2.2
 3     8.8      6.2
 5     8.8      6.2

回答by jezrael

You need duplicatedwith parameter subsetfor specify columns for check with keep=Falsefor all duplicates for mask and filter by boolean indexing:

您需要duplicated使用参数subset来指定用于检查keep=False掩码和过滤器的所有重复项的列boolean indexing

df = df[df.duplicated(subset=['val1','val2'], keep=False)]
print (df)
   id  val1  val2
0   1   1.1   2.2
1   1   1.1   2.2
3   3   8.8   6.2
4   4   1.1   2.2
5   5   8.8   6.2

Detail:

细节:

print (df.duplicated(subset=['val1','val2'], keep=False))
0     True
1     True
2    False
3     True
4     True
5     True
dtype: bool