Python pandas groupby 在多列上聚合，然后进行透视

Question

提问by Davide Tamburrino

In Python, I have a pandas DataFrame similar to the following:

在 Python 中，我有一个类似于以下内容的 Pandas DataFrame：

Item | shop1 | shop2 | shop3 | Category
------------------------------------
Shoes| 45    | 50    | 53    | Clothes
TV   | 200   | 300   | 250   | Technology
Book | 20    | 17    | 21    | Books
phone| 300   | 350   | 400   | Technology

Where shop1, shop2 and shop3 are the costs of every item in different shops. Now, I need to return a DataFrame, after some data cleaning, like this one:

其中 shop1、shop2 和 shop3 是不同商店中每件商品的成本。现在，我需要在一些数据清理后返回一个 DataFrame，如下所示：

Category (index)| size| sum| mean | std
----------------------------------------

where size is the number of items in each Category and sum, mean and std are related to the same functions applied to the 3 shops. How can I do these operations with the split-apply-combine pattern (groupby, aggregate, apply,...) ?

其中 size 是每个类别中的商品数量，sum、mean 和 std 与应用于 3 个商店的相同功能相关。如何使用拆分-应用-组合模式（groupby、aggregate、apply...）执行这些操作？

Can someone help me out? I'm going crazy with this one...thank you!

有人可以帮我吗？我快被这个弄疯了……谢谢！

Answer 1

采纳答案by piRSquared

Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation.

考虑到通过聚合在组中使用字典的弃用，为 Pandas 0.22+ 进行了编辑。

We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns.

我们建立了一个非常相似的字典，我们使用字典的键来指定我们的函数，并使用字典本身来重命名列。

rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std')
df.set_index(['Category', 'Item']).stack().groupby('Category') \
  .agg(rnm_cols.keys()).rename(columns=rnm_cols)

            Size   Sum        Mean        Std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678

option 1
use agg← link to docs

选项 1
使用agg← 链接到文档

agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std')
df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs)

                  Std   Sum        Mean  Size
Category                                     
Books        2.081666    58   19.333333     3
Clothes      4.041452   148   49.333333     3
Technology  70.710678  1800  300.000000     6

option 2
more for less
use describe← link to docs

选项 2
多用少
用describe← 文档链接

df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack()

            count        mean        std    min    25%    50%    75%    max
Category                                                                   
Books         3.0   19.333333   2.081666   17.0   18.5   20.0   20.5   21.0
Clothes       3.0   49.333333   4.041452   45.0   47.5   50.0   51.5   53.0
Technology    6.0  300.000000  70.710678  200.0  262.5  300.0  337.5  400.0

Answer 2

回答by Scott Boston

df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']})

Or if you want it across all shops then:

或者，如果您想在所有商店中使用它，那么：

df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'})
df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']})

Answer 3

回答by foglerit

If I understand correctly, you want to calculate aggregate metrics for all shops, not for each individually. To do that, you can first stackyour dataframe and then group by Category:

如果我理解正确，您想计算所有商店的汇总指标，而不是单独计算每个商店。为此，您可以首先使用stack数据框，然后按Category以下方式分组：

stacked = df.set_index(['Item', 'Category']).stack().reset_index()
stacked.columns = ['Item', 'Category', 'Shop', 'Price']
stacked.groupby('Category').agg({'Price':['count','sum','mean','std']})

Which results in

这导致

           Price                             
           count   sum        mean        std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678

Python pandas groupby 在多列上聚合，然后进行透视

提问by Davide Tamburrino

采纳答案by piRSquared

回答by Scott Boston

回答by foglerit

相关推荐

最近更新

标签

Python pandas groupby 在多列上聚合，然后进行透视

提问by Davide Tamburrino

采纳答案by piRSquared

回答by Scott Boston

回答by foglerit

相关推荐

Python 为什么 Pylint 认为在条件值中使用 len(SEQUENCE) 是不正确的？

Python在索引后找到第一次出现的字符

Python 在 Jupyter Notebook 中使用 matplotlib 绘制动态变化的图形

Python 3.6 中的 f 字符串

相关推荐

最近更新

标签