pandas 如何在熊猫中找到重复项?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34810358/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to find duplicates in pandas?
提问by Luis Ramon Ramirez Rodriguez
I've a data frame of about 52000 rows with some duplicates, when I use
当我使用时,我有一个大约 52000 行的数据框,其中有一些重复
df_drop_duplicates()
I loose about 1000 rows, but I don't want to erase this rows I want to know which ones are the duplicates rows
我丢失了大约 1000 行,但我不想删除这些行我想知道哪些是重复行
回答by Anton Protopopov
You could use duplicated
for that:
你可以使用duplicated
:
df[df.duplicated()]
You could specify keep
argument for what you want, from docs:
您可以keep
从文档中为您想要的内容指定参数:
keep: {‘first', ‘last', False}, default ‘first'
first
: Mark duplicates asTrue
except for the first occurrence.last
: Mark duplicates asTrue
except for the last occurrence.False
: Mark all duplicates asTrue
.
保持:{'first', 'last', False},默认为'first'
first
: 将重复项标记为True
除了第一次出现。last
: 将重复项标记True
为最后一次出现的除外。False
: 将所有重复项标记为True
.