Pandas:按月汇总每个子组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28310252/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:54:32  来源:igfitidea点击:

Pandas: Aggregate by month for every subgroup

pythonpandasresampling

提问by FooBar

I have the following pandastable

我有下pandas

                          TUFNWGTP  TELFS  t070101  t070102  t070103  t070104  \
TUDIARYDATE status                                                              
2003-01-03  emp     8155462.672158      2        0        0        0        0   
2003-01-04  emp     1735322.527819      1        0        0        0        0   
            emp     3830527.482672      2       60        0        0        0   
2003-01-02  unemp   6622022.995205      4        0        0        0        0   
2003-01-09  emp     3068387.344956      1        0        0        0        0

and I want to aggregate the daily data to monthly data, for every subgroup.

我想将每个子组的每日数据汇总为每月数据。

That is, if there was no statussubindex, I would do

也就是说,如果没有子status索引,我会做

df.resample('M', how='sum')

How can I do the monthly aggregation for every subgroup?

如何对每个子组进行月度汇总?

回答by Andy Hayden

I think you need to have a DatetimeIndex (rather than a MultiIndex):

我认为您需要有一个 DatetimeIndex(而不是 MultiIndex):

In [11]: df1 = df.reset_index('status')

In [12]: df1
Out[12]:
            status        TUFNWGTP  TELFS  t070101  t070102  t070103  t070104
TUDIARYDATE
2003-01-03     emp  8155462.672158      2        0        0        0        0
2003-01-04     emp  1735322.527819      1        0        0        0        0
2003-01-04     emp  3830527.482672      2       60        0        0        0
2003-01-02   unemp  6622022.995205      4        0        0        0        0
2003-01-09     emp  3068387.344956      1        0        0        0        0

then do a groupby with a monthly TimeGrouper andthe status column:

然后使用每月的 TimeGrouper状态列进行分组:

In [13]: df1.groupby([pd.TimeGrouper('M'), 'status']).sum()
Out[13]:
                           TUFNWGTP  TELFS  t070101  t070102  t070103  t070104
TUDIARYDATE status
2003-01-31  emp     16789700.027605      6       60        0        0        0
            unemp    6622022.995205      4        0        0        0        0