pandas groupby 聚合，总和在底部

Question

提问by Jason

Here is my code:

这是我的代码：

import StringIO
from pandas import *
import numpy as np

df = read_csv(StringIO.StringIO('''Col1 Col2    A   B
A   D   1   6
A   E   2   7
B   D   3   8
B   E   4   9
C   D   5   19'''), delimiter='\t')


df['buc1'] = cut(df['A'], bins = [0, 2, 6, 8])

aggFunc = {'A': sum,
           'B': np.average
           }

After running:

运行后：

df.groupby(['buc1']).agg(aggFunc)

I get:

我得到：

         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0
(6, 8] NaN   NaN

My questions are:

我的问题是：

How do I get rid of the bottom (6, 8] bucket, given that there are no values there?
How do I add a grand total row at the bottom, as in a pivot table in Excel?

鉴于那里没有值，我如何摆脱底部 (6, 8] 存储桶？
如何在底部添加总计行，如 Excel 中的数据透视表？

For instance:

例如：

buc1    A   B
(0, 2]  3   6.5
(2, 6]  12  12
Total   15  9.8

Note that the total row for the second column would be the average, not the sum.

请注意，第二列的总行将是平均值，而不是总和。

Answer 1

回答by ako

To just drop the narecords, you can use the .dropna()dataframe method.

要删除na记录，您可以使用.dropna()dataframe 方法。

df['buc1'] = df['buc1'].astype(object) 
result = df.groupby(['buc1']).agg(aggFunc).dropna()
result


         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0

As for the marginal totals, in theory this should work:

至于边际总数，理论上这应该有效：

result_alternative = pd.pivot_table(df,index='buc1',
                                    aggfunc=aggFunc,
                                    values=['A','B'],
                                    margins=True)
result_alternative

         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0
All     15   9.8

But it throws an error related to , which I think is a bugthat should be fixed so it may work on your machine (although I am running version 0.17)

但它引发了一个与相关的错误，我认为这是一个应该修复的错误，以便它可以在您的机器上运行（尽管我运行的是 0.17 版）

That same link has a workaround related to the categorical variable--just cast it as an object first, and the above call should be good.

同一个链接有一个与分类变量相关的变通方法——首先将它作为一个对象进行转换，上面的调用应该是好的。

EDIT:

编辑：

If you want to stay with the groupby semantics, you can add a row total like so:

如果您想保留 groupby 语义，您可以添加一行总计，如下所示：

rowtotal = result.sum()
rowtotal.name = 'rowtotal'
result.append(rowtotal)

pandas groupby 聚合，总和在底部

提问by Jason

回答by ako

相关推荐

最近更新

标签

pandas groupby 聚合，总和在底部

提问by Jason

回答by ako

相关推荐

具有布尔值和整数的数据帧的 Pandas 条件子集

Pandas str.contains 用于部分字符串的精确匹配

Python Pandas 按二级索引（或任何其他级别）对多索引进行切片

pandas 如何在熊猫中用空列表[]填充数据框Nan值？

相关推荐

最近更新

标签