pandas 在熊猫数据框中选择独特的观察

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19718531/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:17:59  来源:igfitidea点击:

Selecting unique observations in a pandas data frame

pythonpandas

提问by Michael

I have a pandasdata frame with a column uniqueid. I would like to remove all duplicates from the data frame based on this column, such that all remaining observations are unique.

我有一个pandas带有 column的数据框uniqueid。我想从基于此列的数据框中删除所有重复项,以便所有剩余的观察结果都是唯一的。

回答by cwharland

There is also the drop_duplicates()method for any data frame (docs here). You can pass specific columns to drop from as an argument.

还有drop_duplicates()用于任何数据框的方法(此处文档)。您可以将要删除的特定列作为参数传递。

df.drop_duplicates(subset='uniqueid', inplace=True)

回答by TomAugspurger

Use the duplicatedmethod

使用duplicated方法

Since we only care if uniqueid(Ain my example) is duplicated, select that and call duplicatedon that series. Then use the ~to flip the bools.

由于我们只关心uniqueidA在我的示例中)是否重复,因此选择它并调用duplicated该系列。然后使用~翻转布尔值。

In [90]: df = pd.DataFrame({'A': ['a', 'b', 'b', 'c'], 'B': [1, 2, 3, 4]})

In [91]: df
Out[91]: 
   A  B
0  a  1
1  b  2
2  b  3
3  c  4

In [92]: df['A'].duplicated()
Out[92]: 
0    False
1    False
2     True
3    False
Name: A, dtype: bool

In [93]: df.loc[~df['A'].duplicated()]
Out[93]: 
   A  B
0  a  1
1  b  2
3  c  4