pandas 在python中对多列进行分组求和和计数

Question

提问by Nofy

I have a pandas dataframe that looks like this

我有一个看起来像这样的Pandas数据框

ID     country   month   revenue  profit   ebit
234    USA       201409   10        5       3
344    USA       201409    9        7       2
532    UK        201410    20       10      5
129    Canada    201411    15       10      5

I want to group by ID, country, month and count the IDs per month and country and sum the revenue, profit, ebit. The output for the above data would be:

我想按 ID、国家/地区、月份分组并计算每个月和国家/地区的 ID，然后总结收入、利润、ebit。上述数据的输出将是：

 country   month    revenue   profit  ebit   count
   USA     201409     19        12      5      2
   UK      201409     20        10      5      1
   Canada  201411     15        10      5      1

I have tried different variations of groupby, sum and count functions of pandas but I am unable to figure out how to apply groupby sum and count all together to give the result as shown. Please share any ideas that you might have. Thanks!

我已经尝试了Pandas的 groupby、sum 和 count 函数的不同变体，但我无法弄清楚如何将 groupby sum 和 count 一起应用以给出如图所示的结果。请分享您可能有的任何想法。谢谢！

Answer 1

回答by Mabel Villalba

It can be done using pivot_tablethis way:

可以使用pivot_table这种方式完成：

>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1 
                ebit  profit  revenue
country month                        
Canada  201411     5      10       15
UK      201410     5      10       20
USA     201409     5      12       19

>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2

country  month 
Canada   201411    1
UK       201410    1
USA      201409    2

>>> pd.concat([df1,df2],axis=1)

                ebit  profit  revenue  count
country month                               
Canada  201411     5      10       15      1
UK      201410     5      10       20      1
USA     201409     5      12       19      2

Answer 2

回答by Ben

You can do the groupby, and then map the counts of each country to a new column.

您可以进行分组，然后将每个国家/地区的计数映射到一个新列。

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum().reset_index()
g['count'] = g['country'].map(df['country'].value_counts())
g

Out[3]:


    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   USA      201409  19       12      5     2

Edit

编辑

To get the counts per country and month, you can do another groupby, and then join the two DataFrames together.

要获得每个国家和每个月的计数，您可以执行另一个 groupby，然后将两个 DataFrame 连接在一起。

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum()
j = df.groupby(['country', 'month']).size().to_frame('count')
pd.merge(g, j, left_index=True, right_index=True).reset_index()

Out[6]:

    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   UK       201411  10       5       2     1
3   USA      201409  19       12      5     2

I added another record for the UK with a different date - notice how there are now two UK entries in the merged DataFrame, with the appropriate counts.

我为英国添加了另一条具有不同日期的记录 - 请注意现在合并的 DataFrame 中有两个英国条目，并具有适当的计数。

pandas 在python中对多列进行分组求和和计数

提问by Nofy

回答by Mabel Villalba

回答by Ben

相关推荐

最近更新

标签

pandas 在python中对多列进行分组求和和计数

提问by Nofy

回答by Mabel Villalba

回答by Ben

相关推荐

pandas 将索引号转换为 int (Python)

Pandas 过滤多个子串串联

pandas 按在熊猫中的位置选择多个数据框列

pandas 在熊猫数据框中将浮点数转换为字符串

相关推荐

最近更新

标签