Python Pandas:仅保留包含第一次出现的项目的数据帧行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24136620/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:08:57  来源:igfitidea点击:

Python Pandas: Keeping only dataframe rows containing first occurrence of an item

pythonpandas

提问by DIGSUM

I have this:

我有这个:

    Date value
0   1975     a
21  1975     b
1   1976     b
22  1976     c
3   1977     a
2   1977     b
4   1978     c
25  1978     d
5   1979     e
26  1979     f
6   1980     a
27  1980     f

I am having trouble finding a way to keep only the lines containing the first occurrence of a 'value'. I want to drop duplicate 'values', keeping the row with the lowest 'Date'.The end result should be:

我找不到一种方法来只保留包含第一次出现的“值”的行。我想删除重复的“值”,保留“日期”最低的行。最终结果应该是:

    Date value
0   1975     a
21  1975     b
22  1976     c
25  1978     d
5   1979     e
26  1979     f

回答by FooBar

To make a bit more explicit what Quazi posted: drop_duplicates()is what you need. By default, it keepsthe first occurence and drops everything thereafter - look at the manualfor more information. So, to be sure, you should do

为了更明确地说明 Quazi 发布的内容:drop_duplicates()是您所需要的。默认情况下,它保留第一次出现并在之后丢弃所有内容 - 查看手册以获取更多信息。所以,可以肯定的是,你应该做

>>> dataframe = oldDf.sort('Date').drop_duplicates(subset=['value'])
>>> dataframe
Out[490]: 
    Date value
0   1975     a
21  1975     b
22  1976     c
25  1978     d
5   1979     e
26  1979     f

回答by Mattmattmatt

FooBar is right, but sort is deprecated and replaced by sort_values

FooBar 是对的,但不推荐使用 sort 并替换为 sort_values

dataframe = oldDf.sort_values('Date').drop_duplicates(subset=['value'])

回答by Quazi Farhan

df.drop_duplicates(subset=['value'], inplace=True)