Python Scikit学习中的R2值是如何计算的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23309073/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:43:26  来源:igfitidea点击:

How is the R2 value in Scikit learn calculated?

pythonmachine-learningstatisticsscikit-learn

提问by joeally

The R^2 value returned by scikit learn (metrics.r2_score()) can be negative. The docssay:

scikit learn ( metrics.r2_score())返回的 R^2 值可以为负数。该文件说:

"Unlike most other scores, R2 score may be negative (it need not actually be the square of a quantity R)."

“与大多数其他分数不同,R2 分数可能为负(它实际上不必是数量 R 的平方)。”

However the wikipedia articleon R^2 mentions no R (not squared) quantity. Perhaps it uses absolute differences instead of square differences. I really have no idea

然而,关于 R^2的维基百科文章没有提到 R(非平方)数量。也许它使用绝对差异而不是平方差异。我真的不知道

采纳答案by eickenberg

The R^2in scikit learn is essentially the same as what is described in the wikipedia article on the coefficient of determination(grep for "the most general definition"). It is 1 - residual sum of square / total sum of squares.

R^2在scikit学习的是基本相同什么是描述维基百科文章的决定系数(grep命令“最普遍的定义”)。它是1 - residual sum of square / total sum of squares

The big difference between a classical stats setting and what you usually try to do with machine learning, is that in machine learning you evaluate your score on unseen data, which can lead to results outside [0,1]. If you apply R^2to the same data you used to fit your model, it will lie within [0, 1]

经典统计数据设置与您通常尝试使用机器学习进行的操作之间的最大区别在于,在机器学习中,您会根据看不见的数据评估您的分数,这可能会导致结果超出[0,1]. 如果您应用R^2用于拟合模型的相同数据,它将位于[0, 1]

See also this very similar question

另见这个非常相似的问题

回答by ManiS

Since R^2 = 1 - RSS/TSS, the only case where RSS/TSS > 1 happens when our model is even worse than the worst model assumed (which is the absolute mean model).

由于 R^2 = 1 - RSS/TSS,只有当我们的模型比假设的最差模型(即绝对平均模型)更差时,才会发生 RSS/TSS > 1 的情况。

here RSS = sum of squares of difference between actual values(yi) and predicted values(yi^) and TSS = sum of squares of difference between actual values (yi) and mean value (Before applying Regression). So you can imagine TSS representing the best(actual) model, and RSS being in between our best model and the worst absolute mean model in which case we'll get RSS/TSS < 1. If our model is even worse than the worst mean model then in that case RSS > TSS(Since difference between actual observation and mean value < difference predicted value and actual observation).

这里 RSS = 实际值 (yi) 和预测值 (yi^) 之间的差异平方和和 TSS = 实际值 (yi) 和平均值之间的差异平方和(应用回归之前)。所以你可以想象 TSS 代表最好的(实际)模型,而 RSS 介于我们最好的模型和最差的绝对平均模型之间,在这种情况下,我们将得到 RSS/TSS < 1。如果我们的模型比最坏的平均数更差模型然后在这种情况下 RSS > TSS(因为实际观察值和平均值之间的差异 < 预测值和实际观察值之间的差异)。

Check here for better intuition with visual representation: https://ragrawal.wordpress.com/2017/05/06/intuition-behind-r2-and-other-regression-evaluation-metrics/

在这里查看视觉表现的更好直觉:https: //ragrawal.wordpress.com/2017/05/06/intuition-behind-r2-and-other-regression-evaluation-metrics/