pandas 在python中对多列进行分组求和和计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48768650/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Groupby sum and count on multiple columns in python
提问by Nofy
I have a pandas dataframe that looks like this
我有一个看起来像这样的Pandas数据框
ID country month revenue profit ebit
234 USA 201409 10 5 3
344 USA 201409 9 7 2
532 UK 201410 20 10 5
129 Canada 201411 15 10 5
I want to group by ID, country, month and count the IDs per month and country and sum the revenue, profit, ebit. The output for the above data would be:
我想按 ID、国家/地区、月份分组并计算每个月和国家/地区的 ID,然后总结收入、利润、ebit。上述数据的输出将是:
country month revenue profit ebit count
USA 201409 19 12 5 2
UK 201409 20 10 5 1
Canada 201411 15 10 5 1
I have tried different variations of groupby, sum and count functions of pandas but I am unable to figure out how to apply groupby sum and count all together to give the result as shown. Please share any ideas that you might have. Thanks!
我已经尝试了Pandas的 groupby、sum 和 count 函数的不同变体,但我无法弄清楚如何将 groupby sum 和 count 一起应用以给出如图所示的结果。请分享您可能有的任何想法。谢谢!
回答by Mabel Villalba
It can be done using pivot_table
this way:
可以使用pivot_table
这种方式完成:
>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1
ebit profit revenue
country month
Canada 201411 5 10 15
UK 201410 5 10 20
USA 201409 5 12 19
>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2
country month
Canada 201411 1
UK 201410 1
USA 201409 2
>>> pd.concat([df1,df2],axis=1)
ebit profit revenue count
country month
Canada 201411 5 10 15 1
UK 201410 5 10 20 1
USA 201409 5 12 19 2
回答by Ben
You can do the groupby, and then map the counts of each country to a new column.
您可以进行分组,然后将每个国家/地区的计数映射到一个新列。
g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum().reset_index()
g['count'] = g['country'].map(df['country'].value_counts())
g
Out[3]:
country month revenue profit ebit count
0 Canada 201411 15 10 5 1
1 UK 201410 20 10 5 1
2 USA 201409 19 12 5 2
Edit
编辑
To get the counts per country and month, you can do another groupby, and then join the two DataFrames together.
要获得每个国家和每个月的计数,您可以执行另一个 groupby,然后将两个 DataFrame 连接在一起。
g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum()
j = df.groupby(['country', 'month']).size().to_frame('count')
pd.merge(g, j, left_index=True, right_index=True).reset_index()
Out[6]:
country month revenue profit ebit count
0 Canada 201411 15 10 5 1
1 UK 201410 20 10 5 1
2 UK 201411 10 5 2 1
3 USA 201409 19 12 5 2
I added another record for the UK with a different date - notice how there are now two UK entries in the merged DataFrame, with the appropriate counts.
我为英国添加了另一条具有不同日期的记录 - 请注意现在合并的 DataFrame 中有两个英国条目,并具有适当的计数。