pandas 按多列分组以查找重复行熊猫
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46640945/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Grouping by multiple columns to find duplicate rows pandas
提问by Shubham R
i have a df
我有一个 df
id val1 val2
1 1.1 2.2
1 1.1 2.2
2 2.1 5.5
3 8.8 6.2
4 1.1 2.2
5 8.8 6.2
I want to group by val1 and val2and get similar dataframe only with rows which has multiple occurance of same val1 and val2combination.
我想val1 and val2仅对具有相同val1 and val2组合多次出现的行进行分组并获得类似的数据帧。
Final df:
最终 df:
id val1 val2
1 1.1 2.2
4 1.1 2.2
3 8.8 6.2
5 8.8 6.2
回答by jezrael
You need duplicatedwith parameter subsetfor specify columns for check with keep=Falsefor all duplicates for mask and filter by boolean indexing:
您需要duplicated使用参数subset来指定用于检查keep=False掩码和过滤器的所有重复项的列boolean indexing:
df = df[df.duplicated(subset=['val1','val2'], keep=False)]
print (df)
id val1 val2
0 1 1.1 2.2
1 1 1.1 2.2
3 3 8.8 6.2
4 4 1.1 2.2
5 5 8.8 6.2
Detail:
细节:
print (df.duplicated(subset=['val1','val2'], keep=False))
0 True
1 True
2 False
3 True
4 True
5 True
dtype: bool

