Python 使用 pandas groupby 计算唯一值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41415017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:55:11  来源:igfitidea点击:

Count unique values using pandas groupby

pythonpandasgroup-by

提问by user1684046

I have data of the following form:

我有以下形式的数据:

df = pd.DataFrame({
    'group': [1, 1, 2, 3, 3, 3, 4],
    'param': ['a', 'a', 'b', np.nan, 'a', 'a', np.nan]
})
print(df)

#    group param
# 0      1     a
# 1      1     a
# 2      2     b
# 3      3   NaN
# 4      3     a
# 5      3     a
# 6      4   NaN

Non-null values within groups are always the same. I want to count the non-null value for each group (where it exists) once, and then find the total counts for each value.

组内的非空值始终相同。我想为每个组(它存在的地方)计算一次非空值,然后找到每个值的总计数。

I'm currently doing this in the following (clunky and inefficient) way:

我目前正在以以下(笨拙且低效的)方式执行此操作:

param = []
for _, group in df[df.param.notnull()].groupby('group'):
    param.append(group.param.unique()[0])
print(pd.DataFrame({'param': param}).param.value_counts())

# a    2
# b    1

I'm sure there's a way to do this more cleanly and without using a loop, but I just can't seem to work it out. Any help would be much appreciated.

我确信有一种方法可以更干净地做到这一点并且不使用循环,但我似乎无法解决这个问题。任何帮助将非常感激。

回答by jezrael

I think you can use SeriesGroupBy.nunique:

我认为你可以使用SeriesGroupBy.nunique

print (df.groupby('param')['group'].nunique())
param
a    2
b    1
Name: group, dtype: int64

Another solution with unique, then create new dfby DataFrame.from_records, reshape to Seriesby stackand last value_counts:

另一个解决方案unique,然后创建新的dfby DataFrame.from_records,重塑为Seriesbystack和 last value_counts

a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a    2
b    1
dtype: int64

回答by datapug

This is just an add-on to the solution in case you want to compute not only unique values but other aggregate functions:

这只是解决方案的一个附加组件,以防您不仅要计算唯一值,还要计算其他聚合函数:

df.groupby(['group']).agg(['min','max','count','nunique'])

Hope you find it useful

希望你觉得它有用

回答by nir

I know it has been a while since this was posted, but I think this will help too. I wanted to count unique values and filter the groups by number of these unique values, this is how I did it:

我知道这篇文章发布已经有一段时间了,但我认为这也会有所帮助。我想计算唯一值并按这些唯一值的数量过滤组,我是这样做的:

df.groupby('group').agg(['min','max','count','nunique']).reset_index(drop=False)