Python 为什么我只能从 statsmodels OLS 拟合中获得一个参数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20701484/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why do I get only one parameter from a statsmodels OLS fit
提问by Tom
Here is what I am doing:
这是我在做什么:
$ python
Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
>>> import statsmodels.api as sm
>>> statsmodels.__version__
'0.5.0'
>>> import numpy
>>> y = numpy.array([1,2,3,4,5,6,7,8,9])
>>> X = numpy.array([1,1,2,2,3,3,4,4,5])
>>> res_ols = sm.OLS(y, X).fit()
>>> res_ols.params
array([ 1.82352941])
I had expected an array with two elements?!? The intercept and the slope coefficient?
我原以为有两个元素的数组?!?截距和斜率系数?
采纳答案by behzad.nouri
Try this:
尝试这个:
X = sm.add_constant(X)
sm.OLS(y,X)
as in the documentation:
如文档中所示:
An intercept is not included by default and should be added by the user
默认情况下不包含拦截,应由用户添加
回答by Tom
Just to be complete, this works:
只是为了完成,这有效:
>>> import numpy
>>> import statsmodels.api as sm
>>> y = numpy.array([1,2,3,4,5,6,7,8,9])
>>> X = numpy.array([1,1,2,2,3,3,4,4,5])
>>> X = sm.add_constant(X)
>>> res_ols = sm.OLS(y, X).fit()
>>> res_ols.params
array([-0.35714286, 1.92857143])
It does give me a different slope coefficient, but I guess that figures as we now do have an intercept.
它确实给了我一个不同的斜率系数,但我猜我们现在的数字确实有一个截距。
回答by lukearmistead
I'm running 0.6.1 and it looks like the "add_constant" function has been moved into the statsmodels.tools module. Here's what I ran that worked:
我正在运行 0.6.1,看起来“add_constant”函数已移入 statsmodels.tools 模块。这是我运行的有效方法:
res_ols = sm.OLS(y, statsmodels.tools.add_constant(X)).fit()
回答by sup
Try this, it worked for me:
试试这个,它对我有用:
import statsmodels.formula.api as sm
from statsmodels.api import add_constant
X_train = add_constant(X_train)
X_test = add_constant(X_test)
model = sm.OLS(y_train,X_train)
results = model.fit()
y_pred=results.predict(X_test)
results.params
回答by R.jzadeh
i did add the code X = sm.add_constant(X)but python did not return the intercept value so using a little algebra i decided to do it myself in code:
我确实添加了代码,X = sm.add_constant(X)但 python 没有返回截取值,所以使用一点代数我决定自己在代码中完成:
this code computes regression over 35 samples, 7 features plus one intercept value that i added as feature to the equation:
这段代码计算了 35 个样本的回归,7 个特征加上我作为特征添加到方程的一个截距值:
import statsmodels.api as sm
from sklearn import datasets ## imports datasets from scikit-learn
import numpy as np
import pandas as pd
x=np.empty((35,8)) # (numSamples, oneIntercept + numFeatures))
feature_names = np.empty((8,))
y = np.empty((35,))
dbfv = open("dataset.csv").readlines()
interceptConstant = 1;
i = 0
# reading data and writing in numpy arrays
while i<len(dbfv):
cells = dbfv[i].split(",")
j = 0
x[i][j] = interceptConstant
feature_names[j] = str(j)
while j<len(cells)-1:
x[i][j+1] = cells[j]
feature_names[j+1] = str(j+1)
j += 1
y[i] = cells[len(cells)-1]
i += 1
# creating dataframes
df = pd.DataFrame(x, columns=feature_names)
target = pd.DataFrame(y, columns=["TARGET"])
X = df
y = target["TARGET"]
model = sm.OLS(y, X).fit()
print(model.params)
# predictions = model.predict(X) # make the predictions by the model
# Print out the statistics
print(model.summary())

