pandas 按数据框计算分类数据熊猫组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31080383/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:32:47  来源:igfitidea点击:

Counting categorical data pandas group by dataframe

pythonpandas

提问by metersk

I have a data frame that looks like this:

我有一个如下所示的数据框:

+---+-----------+----------------+-------+
|   |    uid    |      msg       | count |
+---+-----------+----------------+-------+
| 0 | 121437681 | eis            |     1 |
| 1 |  14403832 | eis            |     1 |
| 2 | 190442364 | eis            |     1 |
| 3 | 190102625 | eis            |     1 |
| 4 | 190428772 | eis_reply      |     1 |
| 5 | 190428772 | single_message |     1 |
| 6 | 190428772 | yes            |     1 |
| 7 | 190104837 | eis            |     1 |
| 8 | 144969454 | eis            |     1 |
| 9 | 190738403 | eis            |     1 |
+---+-----------+----------------+-------+

What I would like to do is count the instances of each msgfor each uid.

我想做的是msg为每个 uid计算每个实例。

I created a groupby object and found the count of all messages:

我创建了一个 groupby 对象并找到了所有消息的计数:

grouped_test = test.groupby('uid')
grouped_test.count('msg') 

But I'm not quite sure how to count each type of message for each uid. I was thinking about creating masks and 4 separate data frames, but that doesn't seem like a n efficient way to accomplish this.

但我不太确定如何为每个 uid 计算每种类型的消息。我正在考虑创建掩码和 4 个单独的数据帧,但这似乎不是实现此目的的有效方法。

Sample data - http://www.sharecsv.com/s/16573757eb123c5b15cae4edcb7296e3/sample_data.csv

示例数据 - http://www.sharecsv.com/s/16573757eb123c5b15cae4edcb7296e3/sample_data.csv

回答by BrenBarn

Group by uid and apply value_countsto the msg column:

按 uid 分组并应用于value_countsmsg 列:

>>> d.groupby('uid').msg.value_counts()
uid                      
14403832   eis               1
121437681  eis               1
144969454  eis               1
190102625  eis               1
190104837  eis               1
190170637  eis               1
190428772  eis               1
           single_message    1
           yes               1
           eis_reply         1
190442364  eis               1
190738403  eis               1
190991478  single_message    1
           eis_reply         1
           yes               1
191356453  eis               1
191619393  eis               1
dtype: int64

回答by Alexander

Apply groupbyon both idand msg, and then sum the countof each:

应用groupbyid和 上msg,然后count对每个求和:

>>> df.groupby(['uid', 'msg'])['count'].sum()
uid        msg           
14403832   eis               1
121437681  eis               1
144969454  eis               1
190102625  eis               1
190104837  eis               1
190170637  eis               1
190428772  eis               1
           eis_reply         1
           single_message    1
           yes               1
190442364  eis               1
190738403  eis               1
190991478  eis_reply         1
           single_message    1
           yes               1
191356453  eis               1
191619393  eis               1
Name: count, dtype: int64

You can reset the index to retrieve the flattened version:

您可以重置索引以检索扁平版本:

>>> df.groupby(['uid', 'msg'])['count'].sum().reset_index()
          uid             msg  count
0    14403832             eis      1
1   121437681             eis      1
2   144969454             eis      1
3   190102625             eis      1
4   190104837             eis      1
5   190170637             eis      1
6   190428772             eis      1
7   190428772       eis_reply      1
8   190428772  single_message      1
9   190428772             yes      1
10  190442364             eis      1
11  190738403             eis      1
12  190991478       eis_reply      1
13  190991478  single_message      1
14  190991478             yes      1
15  191356453             eis      1
16  191619393             eis      1