pandas 按多列分组以查找重复行熊猫
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46640945/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Grouping by multiple columns to find duplicate rows pandas
提问by Shubham R
i have a df
我有一个 df
id val1 val2
1 1.1 2.2
1 1.1 2.2
2 2.1 5.5
3 8.8 6.2
4 1.1 2.2
5 8.8 6.2
I want to group by val1 and val2
and get similar dataframe only with rows which has multiple occurance of same val1 and val2
combination.
我想val1 and val2
仅对具有相同val1 and val2
组合多次出现的行进行分组并获得类似的数据帧。
Final df:
最终 df:
id val1 val2
1 1.1 2.2
4 1.1 2.2
3 8.8 6.2
5 8.8 6.2
回答by jezrael
You need duplicated
with parameter subset
for specify columns for check with keep=False
for all duplicates for mask and filter by boolean indexing
:
您需要duplicated
使用参数subset
来指定用于检查keep=False
掩码和过滤器的所有重复项的列boolean indexing
:
df = df[df.duplicated(subset=['val1','val2'], keep=False)]
print (df)
id val1 val2
0 1 1.1 2.2
1 1 1.1 2.2
3 3 8.8 6.2
4 4 1.1 2.2
5 5 8.8 6.2
Detail:
细节:
print (df.duplicated(subset=['val1','val2'], keep=False))
0 True
1 True
2 False
3 True
4 True
5 True
dtype: bool