使用 Pandas OLS 进行预测
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9943848/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Forecasting using Pandas OLS
提问by Turukawa
I have been using the scikits.statsmodels OLS predictfunction to forecast fitted data but would now like to shift to using Pandas.
我一直在使用scikits.statsmodels OLS 预测函数来预测拟合数据,但现在想转向使用 Pandas。
The documentation refers to OLSas well as to a function called y_predictbut I can't find any documentation on how to use it correctly.
该文档指的是 OLS以及名为y_predict的函数,但我找不到有关如何正确使用它的任何文档。
By way of example:
举例来说:
exogenous = {
"1998": "4760","1999": "5904","2000": "4504","2001": "9808","2002": "4241","2003": "4086","2004": "4687","2005": "7686","2006": "3740","2007": "3075","2008": "3753","2009": "4679","2010": "5468","2011": "7154","2012": "4292","2013": "4283","2014": "4595","2015": "9194","2016": "4221","2017": "4520"}
endogenous = {
"1998": "691", "1999": "1580", "2000": "80", "2001": "1450", "2002": "555", "2003": "956", "2004": "877", "2005": "614", "2006": "468", "2007": "191"}
import numpy as np
from pandas import *
ols_test = ols(y=Series(endogenous), x=Series(exogenous))
However, while I can produce a fit:
然而,虽然我可以产生合身:
>>> ols_test.y_fitted
1998 675.268299
1999 841.176837
2000 638.141913
2001 1407.354228
2002 600.000352
2003 577.521485
2004 664.681478
2005 1099.611292
2006 527.342854
2007 430.901264
Prediction produces nothing different:
预测没有什么不同:
>>> ols_test.y_predict
1998 675.268299
1999 841.176837
2000 638.141913
2001 1407.354228
2002 600.000352
2003 577.521485
2004 664.681478
2005 1099.611292
2006 527.342854
2007 430.901264
In scikits.statsmodels one would do the following:
在 scikits.statsmodels 中,可以执行以下操作:
import scikits.statsmodels.api as sm
...
ols_model = sm.OLS(endogenous, np.column_stack(exogenous))
ols_results = ols_mod.fit()
ols_pred = ols_mod.predict(np.column_stack(exog_prediction_values))
How do I do this in Pandas to forecast the endogenous data out to the limits of the exogenous?
我如何在 Pandas 中做到这一点以将内生数据预测到外生数据的极限?
UPDATE: Thanks to Chang, the new version of Pandas (0.7.3) now has this functionality as standard.
更新:感谢 Chang,新版本的 Pandas (0.7.3) 现在将此功能作为标准功能。
采纳答案by Chang She
is your issue how to get the predicted y values of your regression? Or is it how to use the regression coefficients to get predicted y values for a different set of samples for the exogenous variables? pandas y_predict and y_fitted should give you the same value and both should give you the same values as the predict method in scikits.statsmodels.
您的问题是如何获得回归的预测 y 值?或者是如何使用回归系数来获得外生变量的不同样本集的预测 y 值?pandas y_predict 和 y_fitted 应该为您提供相同的值,并且两者都应该为您提供与 scikits.statsmodels 中的 predict 方法相同的值。
If you're looking for the regression coefficients, do ols_test.beta
如果您正在寻找回归系数,请执行 ols_test.beta

