Python pandas groupby 在多列上聚合,然后进行透视
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43172970/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas groupby aggregate on multiple columns, then pivot
提问by Davide Tamburrino
In Python, I have a pandas DataFrame similar to the following:
在 Python 中,我有一个类似于以下内容的 Pandas DataFrame:
Item | shop1 | shop2 | shop3 | Category
------------------------------------
Shoes| 45 | 50 | 53 | Clothes
TV | 200 | 300 | 250 | Technology
Book | 20 | 17 | 21 | Books
phone| 300 | 350 | 400 | Technology
Where shop1, shop2 and shop3 are the costs of every item in different shops. Now, I need to return a DataFrame, after some data cleaning, like this one:
其中 shop1、shop2 和 shop3 是不同商店中每件商品的成本。现在,我需要在一些数据清理后返回一个 DataFrame,如下所示:
Category (index)| size| sum| mean | std
----------------------------------------
where size is the number of items in each Category and sum, mean and std are related to the same functions applied to the 3 shops. How can I do these operations with the split-apply-combine pattern (groupby, aggregate, apply,...) ?
其中 size 是每个类别中的商品数量,sum、mean 和 std 与应用于 3 个商店的相同功能相关。如何使用拆分-应用-组合模式(groupby、aggregate、apply...)执行这些操作?
Can someone help me out? I'm going crazy with this one...thank you!
有人可以帮我吗?我快被这个弄疯了……谢谢!
采纳答案by piRSquared
Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation.
考虑到通过聚合在组中使用字典的弃用,为 Pandas 0.22+ 进行了编辑。
We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns.
我们建立了一个非常相似的字典,我们使用字典的键来指定我们的函数,并使用字典本身来重命名列。
rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std')
df.set_index(['Category', 'Item']).stack().groupby('Category') \
.agg(rnm_cols.keys()).rename(columns=rnm_cols)
Size Sum Mean Std
Category
Books 3 58 19.333333 2.081666
Clothes 3 148 49.333333 4.041452
Technology 6 1800 300.000000 70.710678
option 1
use agg
← link to docs
选项 1
使用agg
← 链接到文档
agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std')
df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs)
Std Sum Mean Size
Category
Books 2.081666 58 19.333333 3
Clothes 4.041452 148 49.333333 3
Technology 70.710678 1800 300.000000 6
option 2
more for less
use describe
← link to docs
选项 2
多用少
用describe
← 文档链接
df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack()
count mean std min 25% 50% 75% max
Category
Books 3.0 19.333333 2.081666 17.0 18.5 20.0 20.5 21.0
Clothes 3.0 49.333333 4.041452 45.0 47.5 50.0 51.5 53.0
Technology 6.0 300.000000 70.710678 200.0 262.5 300.0 337.5 400.0
回答by Scott Boston
df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']})
Or if you want it across all shops then:
或者,如果您想在所有商店中使用它,那么:
df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'})
df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']})
回答by foglerit
If I understand correctly, you want to calculate aggregate metrics for all shops, not for each individually. To do that, you can first stack
your dataframe and then group by Category
:
如果我理解正确,您想计算所有商店的汇总指标,而不是单独计算每个商店。为此,您可以首先使用stack
数据框,然后按Category
以下方式分组:
stacked = df.set_index(['Item', 'Category']).stack().reset_index()
stacked.columns = ['Item', 'Category', 'Shop', 'Price']
stacked.groupby('Category').agg({'Price':['count','sum','mean','std']})
Which results in
这导致
Price
count sum mean std
Category
Books 3 58 19.333333 2.081666
Clothes 3 148 49.333333 4.041452
Technology 6 1800 300.000000 70.710678