使用 Pandas OLS 进行预测

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9943848/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 15:41:22  来源:igfitidea点击:

Forecasting using Pandas OLS

pythonpandasscikits

提问by Turukawa

I have been using the scikits.statsmodels OLS predictfunction to forecast fitted data but would now like to shift to using Pandas.

我一直在使用scikits.statsmodels OLS 预测函数来预测拟合数据,但现在想转向使用 Pandas。

The documentation refers to OLSas well as to a function called y_predictbut I can't find any documentation on how to use it correctly.

该文档指的是 OLS以及名为y_predict的函数,但我找不到有关如何正确使用它的任何文档。

By way of example:

举例来说:

exogenous = {
    "1998": "4760","1999": "5904","2000": "4504","2001": "9808","2002": "4241","2003": "4086","2004": "4687","2005": "7686","2006": "3740","2007": "3075","2008": "3753","2009": "4679","2010": "5468","2011": "7154","2012": "4292","2013": "4283","2014": "4595","2015": "9194","2016": "4221","2017": "4520"}
endogenous = {
    "1998": "691", "1999": "1580", "2000": "80", "2001": "1450", "2002": "555", "2003": "956", "2004": "877", "2005": "614", "2006": "468", "2007": "191"}

import numpy as np
from pandas import *

ols_test = ols(y=Series(endogenous), x=Series(exogenous))

However, while I can produce a fit:

然而,虽然我可以产生合身:

>>> ols_test.y_fitted
1998     675.268299
1999     841.176837
2000     638.141913
2001    1407.354228
2002     600.000352
2003     577.521485
2004     664.681478
2005    1099.611292
2006     527.342854
2007     430.901264

Prediction produces nothing different:

预测没有什么不同:

>>> ols_test.y_predict
1998     675.268299
1999     841.176837
2000     638.141913
2001    1407.354228
2002     600.000352
2003     577.521485
2004     664.681478
2005    1099.611292
2006     527.342854
2007     430.901264

In scikits.statsmodels one would do the following:

在 scikits.statsmodels 中,可以执行以下操作:

import scikits.statsmodels.api as sm
...
ols_model = sm.OLS(endogenous, np.column_stack(exogenous))
ols_results = ols_mod.fit()
ols_pred = ols_mod.predict(np.column_stack(exog_prediction_values))

How do I do this in Pandas to forecast the endogenous data out to the limits of the exogenous?

我如何在 Pandas 中做到这一点以将内生数据预测到外生数据的极限?

UPDATE: Thanks to Chang, the new version of Pandas (0.7.3) now has this functionality as standard.

更新:感谢 Chang,新版本的 Pandas (0.7.3) 现在将此功能作为标准功能。

采纳答案by Chang She

is your issue how to get the predicted y values of your regression? Or is it how to use the regression coefficients to get predicted y values for a different set of samples for the exogenous variables? pandas y_predict and y_fitted should give you the same value and both should give you the same values as the predict method in scikits.statsmodels.

您的问题是如何获得回归的预测 y 值?或者是如何使用回归系数来获得外生变量的不同样本集的预测 y 值?pandas y_predict 和 y_fitted 应该为您提供相同的值,并且两者都应该为您提供与 scikits.statsmodels 中的 predict 方法相同的值。

If you're looking for the regression coefficients, do ols_test.beta

如果您正在寻找回归系数,请执行 ols_test.beta