Python Pandas：仅保留包含第一次出现的项目的数据帧行

Question

提问by DIGSUM

I have this:

我有这个：

    Date value
0   1975     a
21  1975     b
1   1976     b
22  1976     c
3   1977     a
2   1977     b
4   1978     c
25  1978     d
5   1979     e
26  1979     f
6   1980     a
27  1980     f

I am having trouble finding a way to keep only the lines containing the first occurrence of a 'value'. I want to drop duplicate 'values', keeping the row with the lowest 'Date'.The end result should be:

我找不到一种方法来只保留包含第一次出现的“值”的行。我想删除重复的“值”，保留“日期”最低的行。最终结果应该是：

    Date value
0   1975     a
21  1975     b
22  1976     c
25  1978     d
5   1979     e
26  1979     f

Answer 1

回答by FooBar

To make a bit more explicit what Quazi posted: drop_duplicates()is what you need. By default, it keepsthe first occurence and drops everything thereafter - look at the manualfor more information. So, to be sure, you should do

为了更明确地说明 Quazi 发布的内容：drop_duplicates()是您所需要的。默认情况下，它保留第一次出现并在此之后丢弃所有内容 - 查看手册以获取更多信息。所以，可以肯定的是，你应该做

>>> dataframe = oldDf.sort('Date').drop_duplicates(subset=['value'])
>>> dataframe
Out[490]: 
    Date value
0   1975     a
21  1975     b
22  1976     c
25  1978     d
5   1979     e
26  1979     f

Answer 2

回答by Mattmattmatt

FooBar is right, but sort is deprecated and replaced by sort_values

FooBar 是对的，但不推荐使用 sort 并替换为 sort_values

dataframe = oldDf.sort_values('Date').drop_duplicates(subset=['value'])

Answer 3

回答by Quazi Farhan

df.drop_duplicates(subset=['value'], inplace=True)

Python Pandas：仅保留包含第一次出现的项目的数据帧行

提问by DIGSUM

回答by FooBar

回答by Mattmattmatt

回答by Quazi Farhan

相关推荐

最近更新

标签

Python Pandas：仅保留包含第一次出现的项目的数据帧行

提问by DIGSUM

回答by FooBar

回答by Mattmattmatt

回答by Quazi Farhan

相关推荐

pandas 基于多索引的多个级别有效地连接两个数据帧

pandas 熊猫 - 与缺失值合并

Pandas `isin` 函数的更快替代方案

pandas 酸洗数据帧

相关推荐

最近更新

标签