pandas 如何在熊猫中将月度数据转换为季度数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40497199/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to convert monthly data to quarterly in pandas
提问by alernerdev
I have monthly data. I want to convert it to "periods" of 3 months where q1 starts in January. So in the example below, the first 3 month aggregation would translate into start of q2 (desired format: 1996q2). And the data value that results from mushing together 3 monthly values is a mean (average) of 3 columns. Conceptually, not complicated. Does anyone know how to do it in one swoop? Potentially, I could do a lot of hard work through looping and just hardcode the hell out of it, but I am new to pandas and looking for something more clever than brute force.
我有月度数据。我想将其转换为 3 个月的“周期”,其中 q1 于 1 月开始。因此,在下面的示例中,前 3 个月的聚合将转换为 q2 的开始(所需格式:1996q2)。将 3 个月值混合在一起得到的数据值是 3 列的平均值(平均值)。从概念上讲,并不复杂。有谁知道如何一举完成?潜在地,我可以通过循环做很多艰苦的工作,只是硬编码它的地狱,但我是Pandas的新手,正在寻找比蛮力更聪明的东西。
1996-04 1996-05 1996-06 1996-07 ..... 25 19 37 40
So I am looking for:
所以我在寻找:
1996q2 1996q3 1996q4 1997q1 1997q2 ..... avg avg avg ... ...
回答by MaxU
you can use pd.PeriodIndex(..., freq='Q')in conjunction with groupby(..., axis=1):
您可以将pd.PeriodIndex(..., freq='Q')与groupby(..., axis=1)结合使用 :
In [63]: df
Out[63]:
1996-04 1996-05 2000-07 2000-08 2010-10 2010-11 2010-12
0 1 2 3 4 1 1 1
1 25 19 37 40 1 2 3
2 10 20 30 40 4 4 5
In [64]: df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1).mean()
Out[64]:
1996Q2 2000Q3 2010Q4
0 1.5 3.5 1.000000
1 22.0 38.5 2.000000
2 15.0 35.0 4.333333
UPDATE: to get columns in a resulting DF as strings intead of period
dtype:
更新:将结果 DF 中的列作为字符串而不是period
dtype 获取:
In [66]: res = (df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1)
.mean()
.rename(columns=lambda c: str(c).lower()))
In [67]: res
Out[67]:
1996q2 2000q3 2010q4
0 1.5 3.5 1.000000
1 22.0 38.5 2.000000
2 15.0 35.0 4.333333
In [68]: res.columns.dtype
Out[68]: dtype('O')