如何在 Pandas 中获得过去几个月的移动平均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45825993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get moving average of past months in Pandas
提问by Dylan
I have a data set with first column is the Date and Second column is the Price. The Date is trading days.
我有一个数据集,第一列是日期,第二列是价格。日期为交易日。
I want to return a table looks like this:
我想返回一个看起来像这样的表:
Where the date is each Month starting from 2006, price MA is the average price of past N months.(N = [1,2,3,4,5,6])
其中日期是从 2006 年开始的每个月,价格 MA 是过去 N 个月的平均价格。 (N = [1,2,3,4,5,6])
So for example: If I want N = 1 at Jan.1 2006 Ma should be the average price from December last year If N =2 Ma should be the average price from Nov and December last year
例如:如果我想要 N = 1 在 2006 年 1 月 1 日 Ma 应该是去年 12 月的平均价格 如果 N = 2 Ma 应该是去年 11 月和 12 月的平均价格
I have read some solution about Extract month from datetime and groupby. But don't know how to put them up together.
我已经阅读了一些关于从日期时间和分组中提取月份的解决方案。但是不知道怎么把它们放在一起。
回答by YOBEN_S
Or you simply try
或者你干脆试试
df.sort_index(ascending=False).rolling(5).mean().sort_index(ascending=True)
For your additional question
对于您的其他问题
index=pd.date_range(start="4th of July 2017",periods=30,freq="D")
df=pd.DataFrame(np.random.randint(0,100,30),index=index)
df['Month']=df.index
df.Month=df.Month.astype(str).str[0:7]
df.groupby('Month')[0].mean()
Out[162]:
Month
2017-07 47.178571
2017-08 56.000000
Name: 0, dtype: float64
EDIT 3 : For missing value rolling two month mean
编辑 3:对于滚动两个月平均值的缺失值
index=pd.date_range(start="4th of July 2017",periods=300,freq="D")
df=pd.DataFrame(np.random.randint(0,100,300),index=index)
df['Month']=df.index
df.Month=df.Month.astype(str).str[0:7]
df=df.groupby('Month')[0].agg({'sum':'sum','count':'count'})
df['sum'].rolling(2).sum()/df['count'].rolling(2).sum()
Out[200]:
Month
2017-07 NaN
2017-08 43.932203
2017-09 45.295082
2017-10 46.967213
2017-11 46.327869
2017-12 49.081967
#etc
回答by 2Obe
Will return the rolling mean for the number of periods specified with the window parameter. E.g. window=1 will retunr the original list. Window=2 will calculate the mean for 2 days and so on.
将返回使用 window 参数指定的周期数的滚动平均值。例如 window=1 将返回原始列表。Window=2 将计算 2 天的平均值,依此类推。
index=pd.date_range(start="4th of July 2017",periods=30,freq="D")
df=pd.DataFrame(np.random.randint(0,100,30),index=index)
print([pd.rolling_mean(df,window=i,freq="D") for i in range(1,5)])
.....
.....
2017-07-04 NaN
2017-07-05 20.5
2017-07-06 64.5
2017-07-07 58.5
2017-07-08 13.0
2017-07-09 4.5
2017-07-10 17.5
2017-07-11 23.5
2017-07-12 40.5
2017-07-13 60.0
2017-07-14 73.0
2017-07-15 90.0
2017-07-16 56.5
2017-07-17 55.0
2017-07-18 57.0
2017-07-19 45.0
2017-07-20 77.0
2017-07-21 46.5
2017-07-22 3.5
2017-07-23 48.5
2017-07-24 71.5
2017-07-25 52.0
2017-07-26 56.5
2017-07-27 47.5
2017-07-28 64.0
2017-07-29 82.0
2017-07-30 68.0
2017-07-31 72.5
2017-08-01 58.5
2017-08-02 67.0
.....
.....
Further you can drop NA values with the df dropna method like:
此外,您可以使用 df dropna 方法删除 NA 值,例如:
df.rolling(window=2,freq="D").mean().dropna() #Here you must adjust the window size
So the whole code which should print you the rolling mean for the months is:
因此,应该为您打印月份滚动平均值的整个代码是:
print([df.rolling(i,freq="m").mean().dropna() for i in range(len(df.rolling(window=1,freq="m").sum()))])
回答by Yanfei W.
First, set Date
as index:
首先,设置Date
为索引:
price_df.set_index('Date', inplace=True)
price_df.index = pd.to_datetime(price_df.index)
price_df.set_index('Date', inplace=True)
price_df.index = pd.to_datetime(price_df.index)
Then, calculate moving average from past N months:mv = price_df.rolling(window=i*30, center=False).mean().dropna()
for N=i
然后,计算过去 N 个月的移动平均值:mv = price_df.rolling(window=i*30, center=False).mean().dropna()
对于N=i
Finally, return a subset only with first day of each month (if that is what you want to return):mv.ix[mv.index.day==1]
最后,仅在每个月的第一天返回一个子集(如果这是您想要返回的):mv.ix[mv.index.day==1]