pandas 熊猫：在 groupby 'date' 中删除重复项

Question

提问by Michael Perdue

In the dataframe below, I would like to eliminate the duplicate cidvalues so the output from df.groupby('date').cid.size()matches the output from df.groupby('date').cid.nunique().

在下面的数据cid框中，我想消除重复值，以便的输出df.groupby('date').cid.size()与df.groupby('date').cid.nunique().

I have looked at this postbut it does not seem to have a solid solution to the problem.

我看过这篇文章，但似乎没有解决问题的可靠方法。

df = pd.read_csv('https://raw.githubusercontent.com/108michael/ms_thesis/master/crsp.dime.mpl.df')

df.groupby('date').cid.size()

date
2005       7
2006     237
2007    3610
2008    1318
2009    2664
2010     997
2011    6390
2012    2904
2013    7875
2014    3979

df.groupby('date').cid.nunique()

date
2005      3
2006     10
2007    227
2008     52
2009    142
2010     57
2011    219
2012     99
2013    238
2014    146
Name: cid, dtype: int64

Things I tried:

我尝试过的事情：

df.groupby([df['date']]).drop_duplicates(cols='cid')gives this error: AttributeError: Cannot access callable attribute 'drop_duplicates' of 'DataFrameGroupBy' objects, try using the 'apply' method
df.groupby(('date').drop_duplicates('cid'))gives this error: AttributeError: 'str' object has no attribute 'drop_duplicates'

df.groupby([df['date']]).drop_duplicates(cols='cid')给出这个错误： AttributeError: Cannot access callable attribute 'drop_duplicates' of 'DataFrameGroupBy' objects, try using the 'apply' method
df.groupby(('date').drop_duplicates('cid'))给出这个错误： AttributeError: 'str' object has no attribute 'drop_duplicates'

Answer 1

回答by ayhan

You don't need groupby to drop duplicates based on a few columns, you can specify a subset instead:

您不需要 groupby 根据几列删除重复项，您可以指定一个子集：

df2 = df.drop_duplicates(["date", "cid"])
df2.groupby('date').cid.size()
Out[99]: 
date
2005      3
2006     10
2007    227
2008     52
2009    142
2010     57
2011    219
2012     99
2013    238
2014    146
dtype: int64

pandas 熊猫：在 groupby 'date' 中删除重复项

提问by Michael Perdue

回答by ayhan

相关推荐

最近更新

标签

pandas 熊猫：在 groupby 'date' 中删除重复项

提问by Michael Perdue

回答by ayhan

相关推荐

pandas 用其他列的值填充列中的空单元格

pandas 计算特定组的百分位数

pandas 无法绘制饼图的值计数

pandas 熊猫：过去 n 天的平均值

相关推荐

最近更新

标签