pandas 如何使用熊猫按周对数据透视表结果进行分组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19035536/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:11:40  来源:igfitidea点击:

How to use pandas to group pivot table results by week?

pythonsqlgroup-bypandas

提问by jxn

Below is a snippet of my pivot table output in .csv format after using pandas pivot_table function:

以下是使用 Pandas pivot_table 函数后我的 .csv 格式数据透视表输出的片段:

Sub-Product     11/1/12 11/2/12 11/3/12 11/4/12 11/5/12 11/6/12
GP  Acquisitions    164    168     54      72     203    167
GP  Applications    190    207     65      91     227    200
GPF Acquisitions    1124   1142    992    1053    1467   1198
GPF Applications    1391   1430   1269    1357    1855   1510

The only thing I need to do now is to use groupby in pandas to sum up the values by week for each Sub Product before I output it to a .csv file.

我现在唯一需要做的就是在将每个子产品输出到 .csv 文件之前,在 Pandas 中使用 groupby 按周汇总每个子产品的值。

Below is the output I want, but it is done in Excel. The first column might not be exactly the same but I am fine with that. The main thing I need to do is to group the days by week such that I can get sum of the data to be by week. (See how the top row has the dates grouped by every 7 days). Hoping to be able to do this using python/pandas. Is it possible?

下面是我想要的输出,但它是在 Excel 中完成的。第一列可能不完全相同,但我对此表示满意。我需要做的主要事情是按周对天进行分组,以便我可以按周获得数据的总和。(查看顶行如何将日期按每 7 天分组)。希望能够使用 python/pandas 来做到这一点。是否可以?

Row Labels   11/4/12 - 11/10/12       11/11/12 - 11/17/12
GP      
Acquisitions       926                        728
Applications       1092                       889
GPF     
Acquisitions       8206                       6425
Applications       10527                      8894

采纳答案by Dan Allan

The tool you need is resample, which implicitly uses groupby over a time period/frequency and applies a function like mean or sum.

您需要的工具是resample,它在一段时间/频率内隐式使用 groupby 并应用平均值或总和等函数。

Read data.

读取数据。

In [2]: df
Out[2]: 
      Sub-Product  11/1/12  11/2/12  11/3/12  11/4/12  11/5/12  11/6/12
GP   Acquisitions      164      168       54       72      203      167
GP   Applications      190      207       65       91      227      200
GPF  Acquisitions     1124     1142      992     1053     1467     1198
GPF  Applications     1391     1430     1269     1357     1855     1510

Set up a MultiIndex.

设置多索引。

In [4]: df = df.reset_index().set_index(['index', 'Sub-Product'])

In [5]: df
Out[5]: 
                    11/1/12  11/2/12  11/3/12  11/4/12  11/5/12  11/6/12
index Sub-Product                                                       
GP    Acquisitions      164      168       54       72      203      167
      Applications      190      207       65       91      227      200
GPF   Acquisitions     1124     1142      992     1053     1467     1198
      Applications     1391     1430     1269     1357     1855     1510

? ?? Parse the columns as proper datetimes. (They come in as strings.)

? ?? 将列解析为正确的日期时间。(它们以字符串形式出现。)

In [6]: df.columns = pd.to_datetime(df.columns)

In [7]: df
Out[7]: 
                    2012-11-01  2012-11-02  2012-11-03  2012-11-04  \
index Sub-Product                                                    
GP    Acquisitions         164         168          54          72   
      Applications         190         207          65          91   
GPF   Acquisitions        1124        1142         992        1053   
      Applications        1391        1430        1269        1357   

                    2012-11-05  2012-11-06  
index Sub-Product                           
GP    Acquisitions         203         167  
      Applications         227         200  
GPF   Acquisitions        1467        1198  
      Applications        1855        1510  

Resample the columns (axis=1) weekly ('w'), summing by week. (how='sum'or how=np.sumare both valid options here.)

axis=1每周 ( 'w')对列 ( ) 重新采样,按周求和。(how='sum'或者how=np.sum在这里都是有效的选项。)

In [10]: df.resample('w', how='sum', axis=1)
Out[10]: 
                    2012-11-04  2012-11-11
index Sub-Product                         
GP    Acquisitions         458         370
      Applications         553         427
GPF   Acquisitions        4311        2665
      Applications        5447        3365