如何像 R 一样在 Python scikit 中获得回归摘要?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26319259/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:21:32  来源:igfitidea点击:

How to get a regression summary in Python scikit like R does?

pythonrscikit-learnlinear-regressionsummary

提问by mpg

As an R user, I wanted to also get up to speed on scikit.

作为 R 用户,我还想快速了解 scikit。

Creating a linear regression model(s) is fine, but can't seem to find a reasonable way to get a standard summary of regression output.

创建线性回归模型很好,但似乎无法找到一种合理的方法来获得回归输出标准摘要。

Code example:

代码示例:

# Linear Regression
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LinearRegression

# Load the diabetes datasets
dataset = datasets.load_diabetes()

# Fit a linear regression model to the data
model = LinearRegression()
model.fit(dataset.data, dataset.target)
print(model)

# Make predictions
expected = dataset.target
predicted = model.predict(dataset.data)

# Summarize the fit of the model
mse = np.mean((predicted-expected)**2)
print model.intercept_, model.coef_, mse, 
print(model.score(dataset.data, dataset.target))

Issues:

问题:

  • seems like the interceptand coefare built into the model, and I just type print(second to last line) to see them.
  • What about all the other standard regression output like R^2, adjusted R^2, p values, etc.If I read the examples correctly, seems like you have to write a function/equation for each of these and then print it.
  • So, is there no standard summary output for lin. reg. models?
  • Also, in my printed array of outputs of coefficients, there are no variable names associated with each of these? I just get the numeric array.Is there a way to print these where I get an output of the coefficients and the variable they go with?
  • 似乎截距系数内置于模型中,我只需键入print(倒数第二行)即可查看它们。
  • 什么其他所有的标准回归输出像R ^ 2,调整R ^ 2,P值等。如果我正确读取的例子,好像你必须写为每个这些函数/公式,然后打印出来。
  • 那么,lin.d 没有标准的摘要输出吗?注册。楷模?
  • 另外,在我打印的系数输出数组中,没有与这些变量名称相关联的变量名称?我只是得到数字数组。有没有办法打印这些我得到系数和变量的输出?

My printed output:

我的打印输出:

LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
152.133484163 [ -10.01219782 -239.81908937  519.83978679  324.39042769 -792.18416163
  476.74583782  101.04457032  177.06417623  751.27932109   67.62538639] 2859.69039877
0.517749425413

Notes: Started off with Linear, Ridge and Lasso. I have gone through the examples. Below is for the basic OLS.

备注:从 Linear、Ridge 和 Lasso 开始。我已经看过这些例子了。以下是基本OLS。

采纳答案by eickenberg

There exists no R type regression summary report in sklearn. The main reason is that sklearn is used for predictive modelling / machine learning and the evaluation criteria are based on performance on previously unseen data (such as predictive r^2 for regression).

sklearn 中没有 R 类型的回归总结报告。主要原因是 sklearn 用于预测建模/机器学习,评估标准基于对以前未见过的数据的性能(例如用于回归的预测 r^2)。

There does exist a summary function for classification called sklearn.metrics.classification_reportwhich calculates several types of (predictive) scores on a classification model.

确实存在一个称为分类的汇总函数,sklearn.metrics.classification_report它计算分类模型上的几种类型的(预测)分数。

For a more classic statistical approach, take a look at statsmodels.

有关更经典的统计方法,请查看statsmodels

回答by Vinicius Barcelos

Use model.summary()after predict

model.summary()预测后使用

# Linear Regression
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LinearRegression

# load the diabetes datasets
dataset = datasets.load_diabetes()

# fit a linear regression model to the data
model = LinearRegression()
model.fit(dataset.data, dataset.target)
print(model)

# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)

# >>>>>>>Print out the statistics<<<<<<<<<<<<<
model.summary()

# summarize the fit of the model
mse = np.mean((predicted-expected)**2)
print model.intercept_, model.coef_, mse, 
print(model.score(dataset.data, dataset.target))

回答by Akshay Dalal

statsmodels package gives a quiet decent summary

statsmodels 包提供了一个安静的体面总结

from statsmodels.api import OLS
OLS(dataset.target,dataset.data).fit().summary()

回答by Naomi Fridman

I use:

我用:

import sklearn.metrics as metrics
def regression_results(y_true, y_pred):

    # Regression metrics
    explained_variance=metrics.explained_variance_score(y_true, y_pred)
    mean_absolute_error=metrics.mean_absolute_error(y_true, y_pred) 
    mse=metrics.mean_squared_error(y_true, y_pred) 
    mean_squared_log_error=metrics.mean_squared_log_error(y_true, y_pred)
    median_absolute_error=metrics.median_absolute_error(y_true, y_pred)
    r2=metrics.r2_score(y_true, y_pred)

    print('explained_variance: ', round(explained_variance,4))    
    print('mean_squared_log_error: ', round(mean_squared_log_error,4))
    print('r2: ', round(r2,4))
    print('MAE: ', round(mean_absolute_error,4))
    print('MSE: ', round(mse,4))
    print('RMSE: ', round(np.sqrt(mse),4))

回答by Sahil Kamboj

enter image description hereYou can do using statsmodels

在此处输入图片说明您可以使用 statsmodels

import statsmodels.api as sm
X = sm.add_constant(X.ravel())
results = sm.OLS(y,x).fit()
results.summary()  

results.summary() will organize the results into three tabels

results.summary() 将结果组织成三个表格