pandas 基于列的整个 DataFrame 上的 df.unique()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43184491/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
df.unique() on whole DataFrame based on a column
提问by JohnAndrews
I have a DataFrame df
filled with rows and columns where there are duplicate Id's:
我有一个 DataFramedf
填充了行和列,其中有重复的 Id:
Index Id Type
0 a1 A
1 a2 A
2 b1 B
3 b3 B
4 a1 A
...
When I use:
当我使用:
uniqueId = df["Id"].unique()
I get a list of unique IDs.
我得到了一个唯一 ID 列表。
How can I however apply this filtering on the whole DataFrame such that it keeps the structure but that the duplicates (based on "Id") are removed?
但是,我如何在整个 DataFrame 上应用此过滤,以便保留结构但删除重复项(基于“Id”)?
回答by jezrael
It seems you need DataFrame.drop_duplicates
with parameter subset
which specify where are test duplicates:
似乎您需要DataFrame.drop_duplicates
使用参数subset
来指定测试重复项的位置:
#keep first duplicate value
df = df.drop_duplicates(subset=['Id'])
print (df)
Id Type
Index
0 a1 A
1 a2 A
2 b1 B
3 b3 B
#keep last duplicate value
df = df.drop_duplicates(subset=['Id'], keep='last')
print (df)
Id Type
Index
1 a2 A
2 b1 B
3 b3 B
4 a1 A
#remove all duplicate values
df = df.drop_duplicates(subset=['Id'], keep=False)
print (df)
Id Type
Index
1 a2 A
2 b1 B
3 b3 B