pandas groupby 聚合,总和在底部

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33374486/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:06:58  来源:igfitidea点击:

pandas groupby aggregate with grand total in the bottom

pandasgroup-byaggregatesubtotal

提问by Jason

Here is my code:

这是我的代码:

import StringIO
from pandas import *
import numpy as np

df = read_csv(StringIO.StringIO('''Col1 Col2    A   B
A   D   1   6
A   E   2   7
B   D   3   8
B   E   4   9
C   D   5   19'''), delimiter='\t')


df['buc1'] = cut(df['A'], bins = [0, 2, 6, 8])

aggFunc = {'A': sum,
           'B': np.average
           }

After running:

运行后:

df.groupby(['buc1']).agg(aggFunc)

I get:

我得到:

         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0
(6, 8] NaN   NaN

My questions are:

我的问题是:

  1. How do I get rid of the bottom (6, 8] bucket, given that there are no values there?
  2. How do I add a grand total row at the bottom, as in a pivot table in Excel?
  1. 鉴于那里没有值,我如何摆脱底部 (6, 8] 存储桶?
  2. 如何在底部添加总计行,如 Excel 中的数据透视表?

For instance:

例如:

buc1    A   B
(0, 2]  3   6.5
(2, 6]  12  12
Total   15  9.8

Note that the total row for the second column would be the average, not the sum.

请注意,第二列的总行将是平均值,而不是总和。

回答by ako

To just drop the narecords, you can use the .dropna()dataframe method.

要删除na记录,您可以使用.dropna()dataframe 方法。

df['buc1'] = df['buc1'].astype(object) 
result = df.groupby(['buc1']).agg(aggFunc).dropna()
result


         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0

As for the marginal totals, in theory this should work:

至于边际总数,理论上这应该有效:

result_alternative = pd.pivot_table(df,index='buc1',
                                    aggfunc=aggFunc,
                                    values=['A','B'],
                                    margins=True)
result_alternative

         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0
All     15   9.8

But it throws an error related to , which I think is a bugthat should be fixed so it may work on your machine (although I am running version 0.17)

但它引发了一个与 相关的错误,我认为这是一个应该修复的错误,以便它可以在您的机器上运行(尽管我运行的是 0.17 版)

That same link has a workaround related to the categorical variable--just cast it as an object first, and the above call should be good.

同一个链接有一个与分类变量相关的变通方法——首先将它作为一个对象进行转换,上面的调用应该是好的。

EDIT:

编辑:

If you want to stay with the groupby semantics, you can add a row total like so:

如果您想保留 groupby 语义,您可以添加一行总计,如下所示:

rowtotal = result.sum()
rowtotal.name = 'rowtotal'
result.append(rowtotal)