Python 如何在 scikit-learn 中预测时间序列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20841167/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:21:14  来源:igfitidea点击:

How to predict time series in scikit-learn?

pythonmachine-learningtime-seriesscikit-learn

提问by Roman

Scikit-learn utilizes a very convenient approach based on fitand predictmethods. I have time-series data in the format suited for fitand predict.

Scikit-learn 使用了一种基于fitpredict方法的非常方便的方法。我有适合fit和格式的时间序列数据predict

For example I have the following Xs:

例如,我有以下内容Xs

[[1.0, 2.3, 4.5], [6.7, 2.7, 1.2], ..., [3.2, 4.7, 1.1]]

and the corresponding ys:

和相应的ys

[[1.0], [2.3], ..., [7.7]]

These data have the following meaning. The values stored in ysform a time series. The values in Xsare corresponding time dependent "factors" that are known to have some influence on the values in ys(for example: temperature, humidity and atmospheric pressure).

这些数据具有以下含义。ys以时间序列形式存储的值。中的值Xs是相应的时间相关“因素”,已知这些“因素”会对 中的值产生一些影响ys(例如:温度、湿度和大气压力)。

Now, of course, I can use fit(Xs,ys). But then I get a model in which future values in ysdepend only on factors and do not dependend on the previous Yvalues (at least directly) and this is a limitation of the model. I would like to have a model in which Y_ndepends also on Y_{n-1}and Y_{n-2}and so on. For example I might want to use an exponential moving average as a model. What is the most elegant way to do it in scikit-learn

现在,当然,我可以使用fit(Xs,ys). 但是后来我得到了一个模型,其中未来值ys仅取决于因素而不取决于以前的Y值(至少直接),这是模型的局限性。我想有其中一个模型Y_n还取决于Y_{n-1}Y_{n-2}等。例如,我可能想使用指数移动平均线作为模型。在 scikit-learn 中最优雅的方法是什么

ADDED

添加

As it has been mentioned in the comments, I can extend Xsby adding ys. But this way has some limitations. For example, if I add the last 5 values of yas 5 new columns to X, the information about time ordering of ysis lost. For example, there is no indication in Xthat values in the 5th column follows value in the 4th column and so on. As a model, I might want to have a linear fit of the last five ysand use the found linear function to make a prediction. But if I have 5 values in 5 columns it is not so trivial.

正如评论中提到的,我可以Xs通过添加ys. 但是这种方式有一些局限性。例如,如果我将 5y个新列的最后 5 个值添加到X,则有关时间排序的信息ys将丢失。例如,没有迹象表明X第 5 列中的值在第 4 列中的值之后,依此类推。作为模型,我可能想要对最后五个进行ys线性拟合并使用找到的线性函数进行预测。但是如果我在 5 列中有 5 个值,那就不是那么简单了。

ADDED 2

添加 2

To make my problem even more clear, I would like to give one concrete example. I would like to have a "linear" model in which y_n = c + k1*x1 + k2*x2 + k3*x3 + k4*EMOV_n, where EMOV_n is just an exponential moving average. How, can I implement this simple model in scikit-learn?

为了更清楚我的问题,我想举一个具体的例子。我想要一个“线性”模型y_n = c + k1*x1 + k2*x2 + k3*x3 + k4*EMOV_n,其中 EMOV_n 只是指数移动平均线。我怎样才能在 scikit-learn 中实现这个简单的模型?

采纳答案by cjohnson318

This mightbe what you're looking for, with regard to the exponentially weighted moving average:

可能是您正在寻找的,关于指数加权移动平均线:

import pandas, numpy
ewma = pandas.stats.moments.ewma
EMOV_n = ewma( ys, com=2 )

Here, comis a parameter that you can read about here. Then you can combine EMOV_nto Xs, using something like:

这里com是一个参数,你可以在这里阅读。然后你可以结合EMOV_nto Xs,使用类似的东西:

Xs = numpy.vstack((Xs,EMOV_n))

And then you can look at various linear models, here, and do something like:

然后你可以在这里查看各种线性模型,并执行以下操作:

from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit ( Xs, ys )
print clf.coef_

Best of luck!

祝你好运!

回答by cjohnson318

According to Wikipedia, EWMA works well with stationary data, but it does not work as expected in the presence of trends, or seasonality. In those cases you should use a second or third order EWMA method, respectively. I decided to look at the pandas ewmafunction to see how it handled trends, and this is what I came up with:

根据维基百科,EWMA 适用于固定数据,但在存在趋势或季节性的情况下,它无法按预期工作。在这些情况下,您应该分别使用二阶或三阶 EWMA 方法。我决定看一下 pandasewma函数,看看它是如何处理趋势的,这就是我想出的:

import pandas, numpy as np
ewma = pandas.stats.moments.ewma

# make a hat function, and add noise
x = np.linspace(0,1,100)
x = np.hstack((x,x[::-1]))
x += np.random.normal( loc=0, scale=0.1, size=200 )
plot( x, alpha=0.4, label='Raw' )

# take EWMA in both directions with a smaller span term
fwd = ewma( x, span=15 )          # take EWMA in fwd direction
bwd = ewma( x[::-1], span=15 )    # take EWMA in bwd direction
c = np.vstack(( fwd, bwd[::-1] )) # lump fwd and bwd together
c = np.mean( c, axis=0 )          # average  

# regular EWMA, with bias against trend
plot( ewma( x, span=20 ), 'b', label='EWMA, span=20' )

# "corrected" (?) EWMA
plot( c, 'r', label='Reversed-Recombined' )

legend(loc=8)
savefig( 'ewma_correction.png', fmt='png', dpi=100 )

enter image description here

在此处输入图片说明

As you can see, the EWMA bucks the trend uphill and downhill. We can correct for this (without having to implement a second-order scheme ourselves) by taking the EWMA in both directions and then averaging. I hope your data was stationary!

如您所见,EWMA 逆势上坡和下坡。我们可以通过在两个方向上取 EWMA 然后求平均值来对此进行纠正(无需自己实现二阶方案)。我希望你的数据是固定的!