Pandas/Statsmodel OLS 预测未来值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25514220/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:23:48  来源:igfitidea点击:

Pandas/Statsmodel OLS predicting future values

pythonpandaslinear-regressionstatsmodels

提问by pythonista

I've been trying to get a prediction for future values in a model I've created. I have tried both OLS in pandas and statsmodels. Here is what I have in statsmodels:

我一直试图在我创建的模型中预测未来的价值。我已经在 Pandas 和 statsmodels 中尝试过 OLS。这是我在 statsmodels 中的内容:

import statsmodels.api as sm
endog = pd.DataFrame(dframe['monthly_data_smoothed8'])
smresults = sm.OLS(dframe['monthly_data_smoothed8'], dframe['date_delta']).fit()
sm_pred = smresults.predict(endog)
sm_pred

The length of the array returned is equal to the number of records in my original dataframe but the values are not the same. When I do the following using pandas I get no values returned.

返回的数组长度等于我原始数据框中的记录数,但值不相同。当我使用 Pandas 执行以下操作时,我没有返回任何值。

from pandas.stats.api import ols
res1 = ols(y=dframe['monthly_data_smoothed8'], x=dframe['date_delta'])
res1.predict

(Note that there is no .fit function for OLS in Pandas) Could somebody shed some light on how I might get future predictions from my OLS model in either pandas or statsmodel-I realize I must not be using .predict properly and I've read the multiple other problems people have had but they do not seem to apply to my case.

(请注意,Pandas 中没有 OLS 的 .fit 函数)有人可以说明我如何从 Pandas 或 statsmodel 中的 OLS 模型中获得未来的预测 - 我意识到我一定没有正确使用 .predict 并且我已经阅读人们遇到的其他多个问题,但它们似乎不适用于我的案例。

editI believe 'endog' as defined is incorrect-I should be passing the values for which I want to predict; therefore I've created a date range of 12 periods past the last recorded value. But still I miss something as I am getting the error:

编辑我相信定义的“endog”是不正确的——我应该传递我想要预测的值;因此我创建了一个超过最后记录值 12 个周期的日期范围。但是当我收到错误时,我仍然想念一些东西:

matrices are not aligned

edithere is a snippet of data, the last column (in red) of numbers is the date delta which is a difference in months from the first date:

在这里编辑是一段数据,最后一列(红色)数字是日期增量,它与第一个日期相差几个月:

month   monthly_data    monthly_data_smoothed5  monthly_data_smoothed8  monthly_data_smoothed12 monthly_data_smoothed3  date_delta
0   2011-01-31  3.711838e+11    3.711838e+11    3.711838e+11    3.711838e+11    3.711838e+11    0.000000
1   2011-02-28  3.776706e+11    3.750759e+11    3.748327e+11    3.746975e+11    3.755084e+11    0.919937
2   2011-03-31  4.547079e+11    4.127964e+11    4.083554e+11    4.059256e+11    4.207653e+11    1.938438
3   2011-04-30  4.688370e+11    4.360748e+11    4.295531e+11    4.257843e+11    4.464035e+11    2.924085

回答by chrisb

I think your issue here is that statsmodels doesn't add an intercept by default, so your model doesn't achieve much of a fit. To solve it in your code would be something like this:

我认为您的问题是 statsmodels 默认情况下不会添加拦截,因此您的模型并没有达到很大的拟合度。要在您的代码中解决它,将是这样的:

dframe = pd.read_clipboard() # your sample data
dframe['intercept'] = 1
X = dframe[['intercept', 'date_delta']]
y = dframe['monthly_data_smoothed8']

smresults = sm.OLS(y, X).fit()

dframe['pred'] = smresults.predict()

Also, for what it's worth, I think the statsmodel formula api is much nicer to work with when dealing with DataFrames, and adds an intercept by default (add a - 1to remove). See below, it should give the same answer.

另外,就其价值而言,我认为 statsmodel 公式 api 在处理 DataFrame 时使用起来要好得多,并且默认情况下会添加一个拦截(添加 a- 1以删除)。见下文,它应该给出相同的答案。

import statsmodels.formula.api as smf

smresults = smf.ols('monthly_data_smoothed8 ~ date_delta', dframe).fit()

dframe['pred'] = smresults.predict()

Edit:

编辑:

To predict future values, just pass new data to .predict()For example, using the first model:

要预测未来值,只需将新数据传递给.predict()例如,使用第一个模型:

In [165]: smresults.predict(pd.DataFrame({'intercept': 1, 
                                          'date_delta': [0.5, 0.75, 1.0]}))
Out[165]: array([  2.03927604e+11,   2.95182280e+11,   3.86436955e+11])

On the intercept - there's nothing encoded in the number 1it's just based on the math of OLS (an intercept is perfectly analogous to a regressor that always equals 1), so you can pull the value right off the summary. Looking at the statsmodels docs, an alternative way to add an intercept would be:

在截距上 - 数字中没有任何编码,1它只是基于 OLS 的数学运算(截距完全类似于始终等于 1 的回归量),因此您可以立即从摘要中提取值。查看 statsmodels文档,添加拦截的另一种方法是:

X = sm.add_constant(X)