pandas 如何在熊猫中找到重复项?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34810358/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:30:55  来源:igfitidea点击:

How to find duplicates in pandas?

pythonpandas

提问by Luis Ramon Ramirez Rodriguez

I've a data frame of about 52000 rows with some duplicates, when I use

当我使用时,我有一个大约 52000 行的数据框,其中有一些重复

df_drop_duplicates() 

I loose about 1000 rows, but I don't want to erase this rows I want to know which ones are the duplicates rows

我丢失了大约 1000 行,但我不想删除这些行我想知道哪些是重复行

回答by Anton Protopopov

You could use duplicatedfor that:

你可以使用duplicated

df[df.duplicated()]

You could specify keepargument for what you want, from docs:

您可以keep从文档中为您想要的内容指定参数:

keep: {‘first', ‘last', False}, default ‘first'

  • first: Mark duplicates as Trueexcept for the first occurrence.
  • last: Mark duplicates as Trueexcept for the last occurrence.
  • False: Mark all duplicates as True.

保持:{'first', 'last', False},默认为'first'

  • first: 将重复项标记为True除了第一次出现。
  • last: 将重复项标记True为最后一次出现的除外。
  • False: 将所有重复项标记为True.