Python Pandas:仅保留包含第一次出现的项目的数据帧行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24136620/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas: Keeping only dataframe rows containing first occurrence of an item
提问by DIGSUM
I have this:
我有这个:
Date value
0 1975 a
21 1975 b
1 1976 b
22 1976 c
3 1977 a
2 1977 b
4 1978 c
25 1978 d
5 1979 e
26 1979 f
6 1980 a
27 1980 f
I am having trouble finding a way to keep only the lines containing the first occurrence of a 'value'. I want to drop duplicate 'values', keeping the row with the lowest 'Date'.The end result should be:
我找不到一种方法来只保留包含第一次出现的“值”的行。我想删除重复的“值”,保留“日期”最低的行。最终结果应该是:
Date value
0 1975 a
21 1975 b
22 1976 c
25 1978 d
5 1979 e
26 1979 f
回答by FooBar
To make a bit more explicit what Quazi posted: drop_duplicates()is what you need. By default, it keepsthe first occurence and drops everything thereafter - look at the manualfor more information. So, to be sure, you should do
为了更明确地说明 Quazi 发布的内容:drop_duplicates()是您所需要的。默认情况下,它保留第一次出现并在此之后丢弃所有内容 - 查看手册以获取更多信息。所以,可以肯定的是,你应该做
>>> dataframe = oldDf.sort('Date').drop_duplicates(subset=['value'])
>>> dataframe
Out[490]:
Date value
0 1975 a
21 1975 b
22 1976 c
25 1978 d
5 1979 e
26 1979 f
回答by Mattmattmatt
FooBar is right, but sort is deprecated and replaced by sort_values
FooBar 是对的,但不推荐使用 sort 并替换为 sort_values
dataframe = oldDf.sort_values('Date').drop_duplicates(subset=['value'])
回答by Quazi Farhan
df.drop_duplicates(subset=['value'], inplace=True)

