pandas 删除pandas中所有列中具有相同值的重复行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44759840/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:52:59 来源:igfitidea点击:
Delete duplicate rows with the same value in all columns in pandas
提问by jovicbg
I have a dataframe with about a half a million rows. As I could see, there are plenty of duplicate rows, so how can I drop duplicate rows that have the same value in all of the columns (about 80 columns), not just one?
我有一个大约有一百万行的数据框。正如我所看到的,有很多重复的行,那么如何删除所有列(大约 80 列)中具有相同值的重复行,而不仅仅是一个?
df:
df:
period_start_time id val1 val2 val3
06.13.2017 22:00:00 i53 32 2 10
06.13.2017 22:00:00 i32 32 2 10
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i20 7 7 22
06.13.2017 22:00:00 i20 7 7 22
Desired output:
期望的输出:
period_start_time id val1 val2 val3
06.13.2017 22:00:00 i53 32 2 10
06.13.2017 22:00:00 i32 32 2 10
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i20 7 7 22
回答by jezrael
Use drop_duplicates
:
df = df.drop_duplicates()
print (df)
period_start_time id val1 val2 val3
0 06.13.2017 22:00:00 i53 32 2 10
1 06.13.2017 22:00:00 i32 32 2 10
2 06.13.2017 22:00:00 i32 4 2 8
5 06.13.2017 22:00:00 i20 7 7 22