如何像 R 一样在 Python scikit 中获得回归摘要？

Question

提问by mpg

As an R user, I wanted to also get up to speed on scikit.

作为 R 用户，我还想快速了解 scikit。

Creating a linear regression model(s) is fine, but can't seem to find a reasonable way to get a standard summary of regression output.

创建线性回归模型很好，但似乎无法找到一种合理的方法来获得回归输出的标准摘要。

Code example:

代码示例：

# Linear Regression
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LinearRegression

# Load the diabetes datasets
dataset = datasets.load_diabetes()

# Fit a linear regression model to the data
model = LinearRegression()
model.fit(dataset.data, dataset.target)
print(model)

# Make predictions
expected = dataset.target
predicted = model.predict(dataset.data)

# Summarize the fit of the model
mse = np.mean((predicted-expected)**2)
print model.intercept_, model.coef_, mse, 
print(model.score(dataset.data, dataset.target))

Issues:

问题：

seems like the interceptand coefare built into the model, and I just type print(second to last line) to see them.
What about all the other standard regression output like R^2, adjusted R^2, p values, etc.If I read the examples correctly, seems like you have to write a function/equation for each of these and then print it.
So, is there no standard summary output for lin. reg. models?
Also, in my printed array of outputs of coefficients, there are no variable names associated with each of these? I just get the numeric array.Is there a way to print these where I get an output of the coefficients and the variable they go with?

似乎截距和系数内置于模型中，我只需键入print（倒数第二行）即可查看它们。
什么其他所有的标准回归输出像R ^ 2，调整R ^ 2，P值等。如果我正确读取的例子，好像你必须写为每个这些函数/公式，然后打印出来。
那么，lin.d 没有标准的摘要输出吗？注册。楷模？
另外，在我打印的系数输出数组中，没有与这些变量名称相关联的变量名称？我只是得到数字数组。有没有办法打印这些我得到系数和变量的输出？

My printed output:

我的打印输出：

LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
152.133484163 [ -10.01219782 -239.81908937  519.83978679  324.39042769 -792.18416163
  476.74583782  101.04457032  177.06417623  751.27932109   67.62538639] 2859.69039877
0.517749425413

Notes: Started off with Linear, Ridge and Lasso. I have gone through the examples. Below is for the basic OLS.

备注：从 Linear、Ridge 和 Lasso 开始。我已经看过这些例子了。以下是基本OLS。

Answer 1

采纳答案by eickenberg

There exists no R type regression summary report in sklearn. The main reason is that sklearn is used for predictive modelling / machine learning and the evaluation criteria are based on performance on previously unseen data (such as predictive r^2 for regression).

sklearn 中没有 R 类型的回归总结报告。主要原因是 sklearn 用于预测建模/机器学习，评估标准基于对以前未见过的数据的性能（例如用于回归的预测 r^2）。

There does exist a summary function for classification called sklearn.metrics.classification_reportwhich calculates several types of (predictive) scores on a classification model.

确实存在一个称为分类的汇总函数，sklearn.metrics.classification_report它计算分类模型上的几种类型的（预测）分数。

For a more classic statistical approach, take a look at statsmodels.

有关更经典的统计方法，请查看statsmodels。

Answer 2

回答by Vinicius Barcelos

Use model.summary()after predict

model.summary()预测后使用

# Linear Regression
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LinearRegression

# load the diabetes datasets
dataset = datasets.load_diabetes()

# fit a linear regression model to the data
model = LinearRegression()
model.fit(dataset.data, dataset.target)
print(model)

# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)

# >>>>>>>Print out the statistics<<<<<<<<<<<<<
model.summary()

# summarize the fit of the model
mse = np.mean((predicted-expected)**2)
print model.intercept_, model.coef_, mse, 
print(model.score(dataset.data, dataset.target))

Answer 3

回答by Akshay Dalal

statsmodels package gives a quiet decent summary

statsmodels 包提供了一个安静的体面总结

from statsmodels.api import OLS
OLS(dataset.target,dataset.data).fit().summary()

Answer 4

回答by Naomi Fridman

I use:

我用：

import sklearn.metrics as metrics
def regression_results(y_true, y_pred):

    # Regression metrics
    explained_variance=metrics.explained_variance_score(y_true, y_pred)
    mean_absolute_error=metrics.mean_absolute_error(y_true, y_pred) 
    mse=metrics.mean_squared_error(y_true, y_pred) 
    mean_squared_log_error=metrics.mean_squared_log_error(y_true, y_pred)
    median_absolute_error=metrics.median_absolute_error(y_true, y_pred)
    r2=metrics.r2_score(y_true, y_pred)

    print('explained_variance: ', round(explained_variance,4))    
    print('mean_squared_log_error: ', round(mean_squared_log_error,4))
    print('r2: ', round(r2,4))
    print('MAE: ', round(mean_absolute_error,4))
    print('MSE: ', round(mse,4))
    print('RMSE: ', round(np.sqrt(mse),4))

Answer 5

回答by Sahil Kamboj

You can do using statsmodels

您可以使用 statsmodels

import statsmodels.api as sm
X = sm.add_constant(X.ravel())
results = sm.OLS(y,x).fit()
results.summary()

results.summary() will organize the results into three tabels

results.summary() 将结果组织成三个表格

如何像 R 一样在 Python scikit 中获得回归摘要？

提问by mpg

采纳答案by eickenberg

回答by Vinicius Barcelos

回答by Akshay Dalal

回答by Naomi Fridman

回答by Sahil Kamboj

相关推荐

最近更新

标签

如何像 R 一样在 Python scikit 中获得回归摘要？

提问by mpg

采纳答案by eickenberg

回答by Vinicius Barcelos

回答by Akshay Dalal

回答by Naomi Fridman

回答by Sahil Kamboj

相关推荐

Python - 将秒从纪元时间转换为人类可读的时间

如何退出Python函数，在不退出Python解释器的情况下抛出错误语句

Python 分块读取文件 - RAM 使用，从二进制文件中读取字符串

Python Pandas to_html() 截断字符串内容

相关推荐

最近更新

标签