pandas groupby:如何计算总数的百分比?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46708241/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas groupby: how to calculate percentage of total?
提问by Pythonista anonymous
How can I calculate a column showing the % of total in a groupby?
如何计算显示 groupby 中总数百分比的列?
One way to do it is to calculate it manually after the gorupby, as in the last line of this example:
一种方法是在 gorupby 之后手动计算它,如本示例的最后一行:
import numpy as np
import pandas as pd
df= pd.DataFrame(np.random.randint(5,8,(10,4)), columns=['a','b','c','d'])
g = df.groupby('a').agg({'b':['sum','mean'], 'c':['sum'], 'd':['sum']})
g.columns = g.columns.map('_'.join)
g['b %']=g['b_sum']/g['b_sum'].sum()
However, in my real data I have many more columns, and I'd need the % right after the sum, so with this approach I'd have to manually change the order of the columns.
但是,在我的真实数据中,我有更多的列,并且在求和之后我需要 %,因此使用这种方法我必须手动更改列的顺序。
Is there a more direct way of doing it so that the % is the column right after the sum? Note that I need the agg(), or something equivalent, because in all my groupbys I apply different aggregate functions to different columns (e.g. sum and avg of x, but only the min of y, etc.).
有没有更直接的方法可以使 % 成为总和之后的列?请注意,我需要 agg() 或等效的东西,因为在我所有的 groupbys 中,我将不同的聚合函数应用于不同的列(例如 x 的 sum 和 avg,但只有 y 的 min 等)。
Thanks!
谢谢!
采纳答案by jezrael
I think you need lambda function
in agg
and then replace
column names to %
:
我认为您需要lambda function
输入agg
然后replace
列名%
:
np.random.seed(78)
df= pd.DataFrame(np.random.randint(5,8,(10,4)), columns=['a','b','c','d'])
g =(df.groupby('a')
.agg({'b':['sum',lambda x: x.sum()/ df['b'].sum(),'mean'],
'c':['sum'],
'd':['sum']}))
g.columns = g.columns.map('_'.join).str.replace('<lambda>','%')
print (g)
d_sum c_sum b_sum b_% b_mean
a
5 25 24 24 0.387097 6
6 11 11 14 0.225806 7
7 22 23 24 0.387097 6