Python pandas groupby 在多列上聚合,然后进行透视

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43172970/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:36:09  来源:igfitidea点击:

Python pandas groupby aggregate on multiple columns, then pivot

pythonpandasdataframepivotdata-cleaning

提问by Davide Tamburrino

In Python, I have a pandas DataFrame similar to the following:

在 Python 中,我有一个类似于以下内容的 Pandas DataFrame:

Item | shop1 | shop2 | shop3 | Category
------------------------------------
Shoes| 45    | 50    | 53    | Clothes
TV   | 200   | 300   | 250   | Technology
Book | 20    | 17    | 21    | Books
phone| 300   | 350   | 400   | Technology

Where shop1, shop2 and shop3 are the costs of every item in different shops. Now, I need to return a DataFrame, after some data cleaning, like this one:

其中 shop1、shop2 和 shop3 是不同商店中每件商品的成本。现在,我需要在一些数据清理后返回一个 DataFrame,如下所示:

Category (index)| size| sum| mean | std
----------------------------------------

where size is the number of items in each Category and sum, mean and std are related to the same functions applied to the 3 shops. How can I do these operations with the split-apply-combine pattern (groupby, aggregate, apply,...) ?

其中 size 是每个类别中的商品数量,sum、mean 和 std 与应用于 3 个商店的相同功能相关。如何使用拆分-应用-组合模式(groupby、aggregate、apply...)执行这些操作?

Can someone help me out? I'm going crazy with this one...thank you!

有人可以帮我吗?我快被这个弄疯了……谢谢!

采纳答案by piRSquared

Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation.

考虑到通过聚合在组中使用字典的弃用,为 Pandas 0.22+ 进行了编辑。

We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns.

我们建立了一个非常相似的字典,我们使用字典的键来指定我们的函数,并使用字典本身来重命名列。

rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std')
df.set_index(['Category', 'Item']).stack().groupby('Category') \
  .agg(rnm_cols.keys()).rename(columns=rnm_cols)

            Size   Sum        Mean        Std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678


option 1
use agg← link to docs

选项 1
使用agg← 链接到文档

agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std')
df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs)

                  Std   Sum        Mean  Size
Category                                     
Books        2.081666    58   19.333333     3
Clothes      4.041452   148   49.333333     3
Technology  70.710678  1800  300.000000     6

option 2
more for less
use describe← link to docs

选项 2
多用少
describe← 文档链接

df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack()

            count        mean        std    min    25%    50%    75%    max
Category                                                                   
Books         3.0   19.333333   2.081666   17.0   18.5   20.0   20.5   21.0
Clothes       3.0   49.333333   4.041452   45.0   47.5   50.0   51.5   53.0
Technology    6.0  300.000000  70.710678  200.0  262.5  300.0  337.5  400.0

回答by Scott Boston

df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']})

Or if you want it across all shops then:

或者,如果您想在所有商店中使用它,那么:

df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'})
df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']})

回答by foglerit

If I understand correctly, you want to calculate aggregate metrics for all shops, not for each individually. To do that, you can first stackyour dataframe and then group by Category:

如果我理解正确,您想计算所有商店的汇总指标,而不是单独计算每个商店。为此,您可以首先使用stack数据框,然后按Category以下方式分组:

stacked = df.set_index(['Item', 'Category']).stack().reset_index()
stacked.columns = ['Item', 'Category', 'Shop', 'Price']
stacked.groupby('Category').agg({'Price':['count','sum','mean','std']})

Which results in

这导致

           Price                             
           count   sum        mean        std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678