Pandas/Statsmodel OLS 预测未来值

Question

提问by pythonista

I've been trying to get a prediction for future values in a model I've created. I have tried both OLS in pandas and statsmodels. Here is what I have in statsmodels:

我一直试图在我创建的模型中预测未来的价值。我已经在 Pandas 和 statsmodels 中尝试过 OLS。这是我在 statsmodels 中的内容：

import statsmodels.api as sm
endog = pd.DataFrame(dframe['monthly_data_smoothed8'])
smresults = sm.OLS(dframe['monthly_data_smoothed8'], dframe['date_delta']).fit()
sm_pred = smresults.predict(endog)
sm_pred

The length of the array returned is equal to the number of records in my original dataframe but the values are not the same. When I do the following using pandas I get no values returned.

返回的数组长度等于我原始数据框中的记录数，但值不相同。当我使用 Pandas 执行以下操作时，我没有返回任何值。

from pandas.stats.api import ols
res1 = ols(y=dframe['monthly_data_smoothed8'], x=dframe['date_delta'])
res1.predict

(Note that there is no .fit function for OLS in Pandas) Could somebody shed some light on how I might get future predictions from my OLS model in either pandas or statsmodel-I realize I must not be using .predict properly and I've read the multiple other problems people have had but they do not seem to apply to my case.

（请注意，Pandas 中没有 OLS 的 .fit 函数）有人可以说明我如何从 Pandas 或 statsmodel 中的 OLS 模型中获得未来的预测 - 我意识到我一定没有正确使用 .predict 并且我已经阅读人们遇到的其他多个问题，但它们似乎不适用于我的案例。

editI believe 'endog' as defined is incorrect-I should be passing the values for which I want to predict; therefore I've created a date range of 12 periods past the last recorded value. But still I miss something as I am getting the error:

编辑我相信定义的“endog”是不正确的——我应该传递我想要预测的值；因此我创建了一个超过最后记录值 12 个周期的日期范围。但是当我收到错误时，我仍然想念一些东西：

matrices are not aligned

edithere is a snippet of data, the last column (in red) of numbers is the date delta which is a difference in months from the first date:

在这里编辑是一段数据，最后一列（红色）数字是日期增量，它与第一个日期相差几个月：

month   monthly_data    monthly_data_smoothed5  monthly_data_smoothed8  monthly_data_smoothed12 monthly_data_smoothed3  date_delta
0   2011-01-31  3.711838e+11    3.711838e+11    3.711838e+11    3.711838e+11    3.711838e+11    0.000000
1   2011-02-28  3.776706e+11    3.750759e+11    3.748327e+11    3.746975e+11    3.755084e+11    0.919937
2   2011-03-31  4.547079e+11    4.127964e+11    4.083554e+11    4.059256e+11    4.207653e+11    1.938438
3   2011-04-30  4.688370e+11    4.360748e+11    4.295531e+11    4.257843e+11    4.464035e+11    2.924085

Answer 1

回答by chrisb

I think your issue here is that statsmodels doesn't add an intercept by default, so your model doesn't achieve much of a fit. To solve it in your code would be something like this:

我认为您的问题是 statsmodels 默认情况下不会添加拦截，因此您的模型并没有达到很大的拟合度。要在您的代码中解决它，将是这样的：

dframe = pd.read_clipboard() # your sample data
dframe['intercept'] = 1
X = dframe[['intercept', 'date_delta']]
y = dframe['monthly_data_smoothed8']

smresults = sm.OLS(y, X).fit()

dframe['pred'] = smresults.predict()

Also, for what it's worth, I think the statsmodel formula api is much nicer to work with when dealing with DataFrames, and adds an intercept by default (add a - 1to remove). See below, it should give the same answer.

另外，就其价值而言，我认为 statsmodel 公式 api 在处理 DataFrame 时使用起来要好得多，并且默认情况下会添加一个拦截（添加 a- 1以删除）。见下文，它应该给出相同的答案。

import statsmodels.formula.api as smf

smresults = smf.ols('monthly_data_smoothed8 ~ date_delta', dframe).fit()

dframe['pred'] = smresults.predict()

Edit:

编辑：

To predict future values, just pass new data to .predict()For example, using the first model:

要预测未来值，只需将新数据传递给.predict()例如，使用第一个模型：

In [165]: smresults.predict(pd.DataFrame({'intercept': 1, 
                                          'date_delta': [0.5, 0.75, 1.0]}))
Out[165]: array([  2.03927604e+11,   2.95182280e+11,   3.86436955e+11])

On the intercept - there's nothing encoded in the number 1it's just based on the math of OLS (an intercept is perfectly analogous to a regressor that always equals 1), so you can pull the value right off the summary. Looking at the statsmodels docs, an alternative way to add an intercept would be:

在截距上 - 数字中没有任何编码，1它只是基于 OLS 的数学运算（截距完全类似于始终等于 1 的回归量），因此您可以立即从摘要中提取值。查看 statsmodels文档，添加拦截的另一种方法是：

X = sm.add_constant(X)

Pandas/Statsmodel OLS 预测未来值

提问by pythonista

回答by chrisb

相关推荐

最近更新

标签

Pandas/Statsmodel OLS 预测未来值

提问by pythonista

回答by chrisb

相关推荐

Pandas 和 Python3.4 与 Python 2.7 共存

除了 PostgreSQL 上的“public”之外，Pandas to_sql 无法写入模式

pandas 熊猫全球数据框

pandas 从 ElasticSearch 结果创建 DataFrame

相关推荐

最近更新

标签