Python 带有 bin 计数的 Pandas groupby

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34317149/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:49:09  来源:igfitidea点击:

Pandas groupby with bin counts

pythonpandasdataframepandas-groupby

提问by metersk

I have a DataFrame that looks like this:

我有一个如下所示的 DataFrame:

+----------+---------+-------+
| username | post_id | views |
+----------+---------+-------+
| john     |       1 |     3 |
| john     |       2 |    23 |
| john     |       3 |    44 |
| john     |       4 |    82 |
| jane     |       7 |     5 |
| jane     |       8 |    25 |
| jane     |       9 |    46 |
| jane     |      10 |    56 |
+----------+---------+-------+

and I would like to transform it to count views that belong to certain bins like this:

我想将其转换为计算属于这样的某些垃圾箱的视图:

+------+------+-------+-------+--------+
|      | 1-10 | 11-25 | 25-50 | 51-100 |
+------+------+-------+-------+--------+
| john |    1 |     1 |     1 |      1 |
| jane |    1 |     1 |     1 |      1 |
+------+------+-------+-------+--------+

I tried:

我试过:

bins = [1, 10, 25, 50, 100]
groups = df.groupby(pd.cut(df.views, bins))
groups.username.count()

But it only gives aggregate counts and not counts by user. How can I get bin counts by user?

但它只提供聚合计数而不是用户计数。如何按用户获取垃圾箱计数?

The aggregate counts (using my real data) looks like this:

聚合计数(使用我的真实数据)如下所示:

impressions
(2500, 5000]         2332
(5000, 10000]        1118
(10000, 50000]        570
(50000, 10000000]      14
Name: username, dtype: int64

采纳答案by Alex Riley

You could group by both the bins andusername, compute the group sizes and then use unstack():

您可以按 bin用户名分组,计算组大小,然后使用unstack()

>>> groups = df.groupby(['username', pd.cut(df.views, bins)])
>>> groups.size().unstack()
views     (1, 10]  (10, 25]  (25, 50]  (50, 100]
username
jane            1         1         1          1
john            1         1         1          1