pandas 基于列的整个 DataFrame 上的 df.unique()

Question

提问by JohnAndrews

I have a DataFrame dffilled with rows and columns where there are duplicate Id's:

我有一个 DataFramedf填充了行和列，其中有重复的 Id：

Index   Id   Type
0       a1   A
1       a2   A
2       b1   B
3       b3   B
4       a1   A
...

When I use:

当我使用：

uniqueId = df["Id"].unique()

I get a list of unique IDs.

我得到了一个唯一 ID 列表。

How can I however apply this filtering on the whole DataFrame such that it keeps the structure but that the duplicates (based on "Id") are removed?

但是，我如何在整个 DataFrame 上应用此过滤，以便保留结构但删除重复项（基于“Id”）？

Answer 1

回答by jezrael

It seems you need DataFrame.drop_duplicateswith parameter subsetwhich specify where are test duplicates:

似乎您需要DataFrame.drop_duplicates使用参数subset来指定测试重复项的位置：

#keep first duplicate value
df = df.drop_duplicates(subset=['Id'])
print (df)
       Id Type
Index         
0      a1    A
1      a2    A
2      b1    B
3      b3    B

#keep last duplicate value
df = df.drop_duplicates(subset=['Id'], keep='last')
print (df)
       Id Type
Index         
1      a2    A
2      b1    B
3      b3    B
4      a1    A

#remove all duplicate values
df = df.drop_duplicates(subset=['Id'], keep=False)
print (df)
       Id Type
Index         
1      a2    A
2      b1    B
3      b3    B

pandas 基于列的整个 DataFrame 上的 df.unique()

提问by JohnAndrews

回答by jezrael

相关推荐

最近更新

标签

pandas 基于列的整个 DataFrame 上的 df.unique()

提问by JohnAndrews

回答by jezrael

相关推荐

pandas 将日期从excel文件转换为pandas

pandas python数据帧写入R数据格式

Pandas - 基于条件的重复行

“TypeError: 'DataFrame' 对象是可变的，因此它们不能被散列”在对 Pandas 数据帧索引进行排序时

相关推荐

最近更新

标签