pandas 如何在熊猫中找到重复项？

Question

提问by Luis Ramon Ramirez Rodriguez

I've a data frame of about 52000 rows with some duplicates, when I use

当我使用时，我有一个大约 52000 行的数据框，其中有一些重复

df_drop_duplicates()

I loose about 1000 rows, but I don't want to erase this rows I want to know which ones are the duplicates rows

我丢失了大约 1000 行，但我不想删除这些行我想知道哪些是重复行

Answer 1

回答by Anton Protopopov

You could use duplicatedfor that:

你可以使用duplicated：

df[df.duplicated()]

You could specify keepargument for what you want, from docs:

您可以keep从文档中为您想要的内容指定参数：

keep: {‘first', ‘last', False}, default ‘first'
first: Mark duplicates as Trueexcept for the first occurrence.
last: Mark duplicates as Trueexcept for the last occurrence.
False: Mark all duplicates as True.

保持：{'first', 'last', False}，默认为'first'
first: 将重复项标记为True除了第一次出现。
last: 将重复项标记True为最后一次出现的除外。
False: 将所有重复项标记为True.

pandas 如何在熊猫中找到重复项？

提问by Luis Ramon Ramirez Rodriguez

回答by Anton Protopopov

相关推荐

最近更新

标签

pandas 如何在熊猫中找到重复项？

提问by Luis Ramon Ramirez Rodriguez

回答by Anton Protopopov

相关推荐

Pandas DataFrame 列分配 ValueError：传递的项目数错误

pandas Python 将类方法应用于数据框的行

pandas 在熊猫中将多行连接到一行

pandas Python - Statsmodels.tsa.seasonal_decompose - 数据帧头部和尾部的缺失值

相关推荐

最近更新

标签