pandas 基于列的整个 DataFrame 上的 df.unique()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43184491/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:20:08  来源:igfitidea点击:

df.unique() on whole DataFrame based on a column

pythonpython-3.xpandasdataframeduplicates

提问by JohnAndrews

I have a DataFrame dffilled with rows and columns where there are duplicate Id's:

我有一个 DataFramedf填充了行和列,其中有重复的 Id:

Index   Id   Type
0       a1   A
1       a2   A
2       b1   B
3       b3   B
4       a1   A
...

When I use:

当我使用:

uniqueId = df["Id"].unique() 

I get a list of unique IDs.

我得到了一个唯一 ID 列表。

How can I however apply this filtering on the whole DataFrame such that it keeps the structure but that the duplicates (based on "Id") are removed?

但是,我如何在整个 DataFrame 上应用此过滤,以便保留结构但删除重复项(基于“Id”)?

回答by jezrael

It seems you need DataFrame.drop_duplicateswith parameter subsetwhich specify where are test duplicates:

似乎您需要DataFrame.drop_duplicates使用参数subset来指定测试重复项的位置:

#keep first duplicate value
df = df.drop_duplicates(subset=['Id'])
print (df)
       Id Type
Index         
0      a1    A
1      a2    A
2      b1    B
3      b3    B


#keep last duplicate value
df = df.drop_duplicates(subset=['Id'], keep='last')
print (df)
       Id Type
Index         
1      a2    A
2      b1    B
3      b3    B
4      a1    A


#remove all duplicate values
df = df.drop_duplicates(subset=['Id'], keep=False)
print (df)
       Id Type
Index         
1      a2    A
2      b1    B
3      b3    B