Python 使用 pandas groupby 计算唯一值

Question

提问by user1684046

I have data of the following form:

我有以下形式的数据：

df = pd.DataFrame({
    'group': [1, 1, 2, 3, 3, 3, 4],
    'param': ['a', 'a', 'b', np.nan, 'a', 'a', np.nan]
})
print(df)

#    group param
# 0      1     a
# 1      1     a
# 2      2     b
# 3      3   NaN
# 4      3     a
# 5      3     a
# 6      4   NaN

Non-null values within groups are always the same. I want to count the non-null value for each group (where it exists) once, and then find the total counts for each value.

组内的非空值始终相同。我想为每个组（它存在的地方）计算一次非空值，然后找到每个值的总计数。

I'm currently doing this in the following (clunky and inefficient) way:

我目前正在以以下（笨拙且低效的）方式执行此操作：

param = []
for _, group in df[df.param.notnull()].groupby('group'):
    param.append(group.param.unique()[0])
print(pd.DataFrame({'param': param}).param.value_counts())

# a    2
# b    1

I'm sure there's a way to do this more cleanly and without using a loop, but I just can't seem to work it out. Any help would be much appreciated.

我确信有一种方法可以更干净地做到这一点并且不使用循环，但我似乎无法解决这个问题。任何帮助将非常感激。

Answer 1

回答by jezrael

I think you can use SeriesGroupBy.nunique:

我认为你可以使用SeriesGroupBy.nunique：

print (df.groupby('param')['group'].nunique())
param
a    2
b    1
Name: group, dtype: int64

Another solution with unique, then create new dfby DataFrame.from_records, reshape to Seriesby stackand last value_counts:

另一个解决方案unique，然后创建新的dfby DataFrame.from_records，重塑为Seriesbystack和 last value_counts：

a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a    2
b    1
dtype: int64

Answer 2

回答by datapug

This is just an add-on to the solution in case you want to compute not only unique values but other aggregate functions:

这只是解决方案的一个附加组件，以防您不仅要计算唯一值，还要计算其他聚合函数：

df.groupby(['group']).agg(['min','max','count','nunique'])

Hope you find it useful

希望你觉得它有用

Answer 3

回答by nir

I know it has been a while since this was posted, but I think this will help too. I wanted to count unique values and filter the groups by number of these unique values, this is how I did it:

我知道这篇文章发布已经有一段时间了，但我认为这也会有所帮助。我想计算唯一值并按这些唯一值的数量过滤组，我是这样做的：

df.groupby('group').agg(['min','max','count','nunique']).reset_index(drop=False)

Python 使用 pandas groupby 计算唯一值

提问by user1684046

回答by jezrael

回答by datapug

回答by nir

相关推荐

最近更新

标签

Python 使用 pandas groupby 计算唯一值

提问by user1684046

回答by jezrael

回答by datapug

回答by nir

相关推荐

Python Conda：当前的 win-64 频道中缺少软件包

使用 Python 从文本中删除非英语单词

标记数据时出错。C 错误：pandas python 内存不足，大文件 csv

Python Django REST Framework POST 嵌套对象

相关推荐

最近更新

标签