pandas groupby：如何计算总数的百分比？

Question

提问by Pythonista anonymous

How can I calculate a column showing the % of total in a groupby?

如何计算显示 groupby 中总数百分比的列？

One way to do it is to calculate it manually after the gorupby, as in the last line of this example:

一种方法是在 gorupby 之后手动计算它，如本示例的最后一行：

import numpy as np
import pandas as pd
df= pd.DataFrame(np.random.randint(5,8,(10,4)), columns=['a','b','c','d'])
g = df.groupby('a').agg({'b':['sum','mean'], 'c':['sum'], 'd':['sum']})
g.columns = g.columns.map('_'.join)
g['b %']=g['b_sum']/g['b_sum'].sum()

However, in my real data I have many more columns, and I'd need the % right after the sum, so with this approach I'd have to manually change the order of the columns.

但是，在我的真实数据中，我有更多的列，并且在求和之后我需要 %，因此使用这种方法我必须手动更改列的顺序。

Is there a more direct way of doing it so that the % is the column right after the sum? Note that I need the agg(), or something equivalent, because in all my groupbys I apply different aggregate functions to different columns (e.g. sum and avg of x, but only the min of y, etc.).

有没有更直接的方法可以使 % 成为总和之后的列？请注意，我需要 agg() 或等效的东西，因为在我所有的 groupbys 中，我将不同的聚合函数应用于不同的列（例如 x 的 sum 和 avg，但只有 y 的 min 等）。

Thanks!

谢谢！

Answer 1

采纳答案by jezrael

I think you need lambda functionin aggand then replacecolumn names to %:

我认为您需要lambda function输入agg然后replace列名%：

np.random.seed(78)
df= pd.DataFrame(np.random.randint(5,8,(10,4)), columns=['a','b','c','d'])

g =(df.groupby('a')
      .agg({'b':['sum',lambda x: x.sum()/ df['b'].sum(),'mean'],
            'c':['sum'], 
            'd':['sum']}))
g.columns = g.columns.map('_'.join).str.replace('<lambda>','%')

print (g)

   d_sum  c_sum  b_sum       b_%  b_mean
a                                       
5     25     24     24  0.387097       6
6     11     11     14  0.225806       7
7     22     23     24  0.387097       6

pandas groupby：如何计算总数的百分比？

提问by Pythonista anonymous

采纳答案by jezrael

相关推荐

最近更新

标签

pandas groupby：如何计算总数的百分比？

提问by Pythonista anonymous

采纳答案by jezrael

相关推荐

如何绘制 pandas.crosstab() 列

pandas 如何使用数据框组按不同范围绘制饼图？

pandas Blast 解析：AttributeError：'float' 对象没有属性 'split'

pandas 从数据框中的字符串中删除字符

相关推荐

最近更新

标签