python sklearn多元线性回归显示r平方
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42033720/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python sklearn multiple linear regression display r-squared
提问by jeangelj
I calculated my multiple linear regression equation and I want to see the adjusted R-squared. I know that the score function allows me to see r-squared, but it is not adjusted.
我计算了我的多元线性回归方程,我想查看调整后的 R 平方。我知道 score 函数可以让我看到 r 平方,但它没有调整。
import pandas as pd #import the pandas module
import numpy as np
df = pd.read_csv ('/Users/jeangelj/Documents/training/linexdata.csv', sep=',')
df
AverageNumberofTickets NumberofEmployees ValueofContract Industry
0 1 51 25750 Retail
1 9 68 25000 Services
2 20 67 40000 Services
3 1 124 35000 Retail
4 8 124 25000 Manufacturing
5 30 134 50000 Services
6 20 157 48000 Retail
7 8 190 32000 Retail
8 20 205 70000 Retail
9 50 230 75000 Manufacturing
10 35 265 50000 Manufacturing
11 65 296 75000 Services
12 35 336 50000 Manufacturing
13 60 359 75000 Manufacturing
14 85 403 81000 Services
15 40 418 60000 Retail
16 75 437 53000 Services
17 85 451 90000 Services
18 65 465 70000 Retail
19 95 491 100000 Services
from sklearn.linear_model import LinearRegression
model = LinearRegression()
X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets
model.fit(X, y)
model.score(X, y)
>>0.87764337132340009
I checked it manually and 0.87764 is R-squared; whereas 0.863248 is the adjusted R-squared.
我手动检查过,0.87764 是 R 平方;而 0.863248 是调整后的 R 平方。
回答by Sandipan Dey
There are many different ways to compute R^2
and the adjusted R^2
, the following are few of them (computed with the data you provided):
有许多不同的计算方法R^2
和adjusted R^2
,以下是其中一些(使用您提供的数据计算):
from sklearn.linear_model import LinearRegression
model = LinearRegression()
X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets
model.fit(X, y)
SST = SSR + SSE (ref definitions)
SST = SSR + SSE(参考定义)
# compute with formulas from the theory
yhat = model.predict(X)
SS_Residual = sum((y-yhat)**2)
SS_Total = sum((y-np.mean(y))**2)
r_squared = 1 - (float(SS_Residual))/SS_Total
adjusted_r_squared = 1 - (1-r_squared)*(len(y)-1)/(len(y)-X.shape[1]-1)
print r_squared, adjusted_r_squared
# 0.877643371323 0.863248473832
# compute with sklearn linear_model, although could not find any function to compute adjusted-r-square directly from documentation
print model.score(X, y), 1 - (1-model.score(X, y))*(len(y)-1)/(len(y)-X.shape[1]-1)
# 0.877643371323 0.863248473832
Another way:
其它的办法:
# compute with statsmodels, by adding intercept manually
import statsmodels.api as sm
X1 = sm.add_constant(X)
result = sm.OLS(y, X1).fit()
#print dir(result)
print result.rsquared, result.rsquared_adj
# 0.877643371323 0.863248473832
Yet another way:
还有一种方式:
# compute with statsmodels, another way, using formula
import statsmodels.formula.api as sm
result = sm.ols(formula="AverageNumberofTickets ~ NumberofEmployees + ValueofContract", data=df).fit()
#print result.summary()
print result.rsquared, result.rsquared_adj
# 0.877643371323 0.863248473832
回答by Madhushree
regressor = LinearRegression(fit_intercept=False)
regressor.fit(x_train, y_train)
print(f'r_sqr value: {regressor.score(x_train, y_train)}')