在 Pandas 中按组计算同比增长
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28328636/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculating year over year growth by group in Pandas
提问by invoker
I have the following dataframe:
我有以下几点dataframe:
In [1]: df
Out[1]:
ID Month Transaction_Amount
1 2013/01 10
1 2013/02 20
1 2013/03 10
1 2013/04 20
1 2013/05 10
1 2013/06 20
1 2013/07 10
1 2013/08 20
1 2013/09 10
1 2013/10 20
1 2013/11 10
1 2013/12 20
1 2014/01 15
1 2014/02 25
1 2014/03 15
1 2014/04 25
...
1 2014/11 15
1 2014/12 25
...
10000000 2014/11 13
10000000 2014/12 23
What I would like to do is calculate the growth over rolling month periods year over year, so for example, I would want to find the value of (2014/01 - 2013/01) / (2014/01)which is (15 - 10) / (15) = 1/3and save this for the first rolling period. There will be a total of 12 rolling periods for each ID. I'm thinking that the final output should look like:
我想这样做是计算增长滚动月期间较去年同期,因此,例如,我想找到的值(2014/01 - 2013/01) / (2014/01)是(15 - 10) / (15) = 1/3第一个滚动周期,节约这一点。每个 总共有 12 个滚动周期ID。我认为最终的输出应该是这样的:
In [2]: df_new
Out[2]:
ID rolling_period_1 rolling_period_2 ... rolling_period_12
1 .333333 .25 .25
2 x1 x2 x12
3 y1 y2 y12
4 z1 z2 z12
...
I generated a list containing tuples of every year over period [(2013/01, 2014/01), (2013/02, 2014/02) ... (2013/12, 2014/12)]and have been playing around with isinto index a subset of the original df, but I am unsure how to arrive at the df_new.
我生成了一个包含一段时间内每年元组的列表,[(2013/01, 2014/01), (2013/02, 2014/02) ... (2013/12, 2014/12)]并且一直在isin尝试索引原始的子集df,但我不确定如何到达df_new.
EDIT
编辑
I have created a new dataframe called temp_dfwith the following code:
我创建了一个temp_df使用以下代码调用的新数据框:
In [4]: temp_df = df[df['month'].isin(('2013/01','2014/01'))]
In [5]: temp_df
Out[5]:
ID Month Transaction_Amount
1 2013/01 10
1 2014/01 15
2 2013/01 20
2 2014/01 30
3 2013/01 15
3 2014/01 30
...
What I would like to produce is a DataFramethat looks like the following:
我想要制作的是一个DataFrame如下所示的:
In [6]: new_df
Out[6]:
ID Transaction_Growth
1 .3333 # (15-10)/15
2 .3333 # (30-20)/30
3 .50 # (30-15)/30
...
采纳答案by JAB
you can use shift to offset the rows in the dataframe.
您可以使用 shift 来偏移数据框中的行。
Create dummy data with month column and values column
使用月份列和值列创建虚拟数据
rng = pd.date_range('1/1/2011', periods=90, freq='M')
df = pd.DataFrame({'value':range(1,91),'date':rng})
set the month column to index
将月份列设置为索引
df = df.set_index('date')
shift a copy of the dataframe 12 periods to get the value 12 months ago, minus from the current record, and divide by current record:
将数据帧的副本移动 12 个周期以获得 12 个月前的值,减去当前记录,然后除以当前记录:
df - df.shift(12)/ df
update the sign on the period in the shift function
更新移位函数中的句号
Updated to consider ID
更新以考虑 ID
# Create range of months
rng = pd.date_range('1/1/2011', periods=180, freq='M')
ID = np.array([1,2,3])
# Create ID column
ID = np.repeat(ID,60)
# Create dummy data in dataframe
df = pd.DataFrame({'ID':ID,'value':range(1,181),'date':rng})
# Use shift on a group by object
(df.value - df.groupby(['ID']).value.shift(12))/ df.value
回答by Jonathan
I think there is a much more simple way to do this that doesn't require keeping track of shifting time periods, try using the df.pct_change()method:
我认为有一种更简单的方法可以做到这一点,不需要跟踪移动时间段,请尝试使用以下df.pct_change()方法:
import pandas as pd
import numpy as np
date_range = pd.period_range("2016-01", "2018-01",freq='m')
df= pd.DataFrame({'A':np.random.rand(len(date_range))}, index=date_range)
df['pct_pop'] = df['A'].pct_change()
df['pct_yoy'] = df['A'].pct_change(12)
df
A pct_pop pct_yoy
2016-01 0.478381 NaN NaN
2016-02 0.941450 0.967991 NaN
2016-03 0.128445 -0.863567 NaN
2016-04 0.498623 2.882011 NaN
2016-05 0.914663 0.834377 NaN
2016-06 0.349565 -0.617821 NaN
2016-07 0.563296 0.611419 NaN
2016-08 0.144055 -0.744264 NaN
2016-09 0.502279 2.486708 NaN
2016-10 0.621283 0.236928 NaN
2016-11 0.716813 0.153763 NaN
2016-12 0.152372 -0.787431 NaN
2017-01 0.160636 0.054234 -0.664209
2017-02 0.496759 2.092453 -0.472347
2017-03 0.324318 -0.347132 1.524965
2017-04 0.431651 0.330949 -0.134315
2017-05 0.973095 1.254357 0.063884
2017-06 0.007917 -0.991864 -0.977351
2017-07 0.875365 109.562870 0.554005
2017-08 0.860987 -0.016425 4.976784
2017-09 0.099549 -0.884378 -0.801805
2017-10 0.544275 4.467398 -0.123950
2017-11 0.433326 -0.203846 -0.395482
2017-12 0.688057 0.587850 3.515636
2018-01 0.924038 0.342967 4.752374

