Python 熊猫得到一个分组的平均值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40066837/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:08:07  来源:igfitidea点击:

pandas get average of a groupby

pythonpandasdataframegroup-by

提问by jxn

I am trying to find the average monthly cost per user_id but i am only able to get average cost per user or monthly cost per user.

我试图找到每个 user_id 的平均每月成本,但我只能获得每个用户的平均成本或每个用户的每月成本。

Because i group by user and month, there is no way to get the average of the second groupby (month) unless i transform the groupby output to something else.

因为我按用户和月份分组,除非我将 groupby 输出转换为其他内容,否则无法获得第二个 groupby(月份)的平均值。

This is my df:

这是我的 df:

     df = { 'id' : pd.Series([1,1,1,1,2,2,2,2]),
            'cost' : pd.Series([10,20,30,40,50,60,70,80]),
            'mth': pd.Series([3,3,4,5,3,4,4,5])}

   cost  id  mth
0    10   1    3
1    20   1    3
2    30   1    4
3    40   1    5
4    50   2    3
5    60   2    4
6    70   2    4
7    80   2    5

I can get monthly sum but i want the average of the months for each user_id.

我可以获得每月总和,但我想要每个 user_id 的月数平均值。

df.groupby(['id','mth'])['cost'].sum()

id  mth
1   3       30
    4       30
    5       40
2   3       50
    4      130
    5       80

i want something like this:

我想要这样的东西:

id average_monthly
1 (30+30+40)/3
2 (50+130+80)/3

回答by Jerome Montino

Resetting the index should work. Try this:

重置索引应该可以工作。尝试这个:

In [19]: df.groupby(['id', 'mth']).sum().reset_index().groupby('id').mean()  
Out[19]: 
    mth       cost
id                
1   4.0  33.333333
2   4.0  86.666667

You can just drop mthif you want. The logic is that after the sumpart, you have this:

mth如果你愿意,你可以放弃。逻辑是在sum部分之后,你有这个:

In [20]: df.groupby(['id', 'mth']).sum()
Out[20]: 
        cost
id mth      
1  3      30
   4      30
   5      40
2  3      50
   4     130
   5      80

Resetting the index at this point will give you unique months.

此时重置索引将为您提供唯一的月份。

In [21]: df.groupby(['id', 'mth']).sum().reset_index()
Out[21]: 
   id  mth  cost
0   1    3    30
1   1    4    30
2   1    5    40
3   2    3    50
4   2    4   130
5   2    5    80

It's just a matter of grouping it again, this time using meaninstead of sum. This should give you the averages.

这只是再次分组的问题,这次使用mean代替sum。这应该给你平均值。

Let us know if this helps.

如果这有帮助,请告诉我们。