python中是否有均方根误差(RMSE)的库函数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17197492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:44:36  来源:igfitidea点击:

Is there a library function for Root mean square error (RMSE) in python?

pythonscikit-learnscipy

提问by siamii

I know I could implement a root mean squared error function like this:

我知道我可以实现这样的均方根误差函数:

def rmse(predictions, targets):
    return np.sqrt(((predictions - targets) ** 2).mean())

What I'm looking for if this rmse function is implemented in a library somewhere, perhaps in scipy or scikit-learn?

如果这个 rmse 函数是在某个地方的库中实现的,也许是在 scipy 或 scikit-learn 中,我在寻找什么?

回答by Cokes

This is probably faster?:

这可能更快?:

n = len(predictions)
rmse = np.linalg.norm(predictions - targets) / np.sqrt(n)

回答by Josef

Actually, I did write a bunch of those as utility functions for statsmodels

实际上,我确实为 statsmodels 编写了一堆实用函数

http://statsmodels.sourceforge.net/devel/tools.html#measure-for-fit-performance-eval-measures

http://statsmodels.sourceforge.net/devel/tools.html#measure-for-fit-performance-eval-measures

and http://statsmodels.sourceforge.net/devel/generated/statsmodels.tools.eval_measures.rmse.html#statsmodels.tools.eval_measures.rmse

http://statsmodels.sourceforge.net/devel/generated/statsmodels.tools.eval_measures.rmse.html#statsmodels.tools.eval_measures.rmse

Mostly one or two liners and not much input checking, and mainly intended for easily getting some statistics when comparing arrays. But they have unit tests for the axis arguments, because that's where I sometimes make sloppy mistakes.

主要是一两个行,没有太多的输入检查,主要是为了在比较数组时轻松获取一些统计信息。但是他们对轴参数进行了单元测试,因为那是我有时会犯草率错误的地方。

回答by Greg

sklearn.metricshas a mean_squared_errorfunction. The RMSE is just the square root of whatever it returns.

sklearn.metrics有一个mean_squared_error功能。RMSE 只是它返回的任何东西的平方根。

from sklearn.metrics import mean_squared_error
from math import sqrt

rms = sqrt(mean_squared_error(y_actual, y_predicted))

回答by Eric Leschinski

What is RMSE? Also known as MSE, RMD, or RMS. What problem does it solve?

什么是均方根误差?也称为 MSE、RMD 或 RMS。它解决什么问题?

If you understand RMSE: (Root mean squared error), MSE: (Mean Squared Error) RMD (Root mean squared deviation) and RMS: (Root Mean Squared), then asking for a library to calculate this for you is unnecessary over-engineering. All these metrics are a single line of python code at most 2 inches long. The three metrics rmse, mse, rmd, and rms are at their core conceptually identical.

如果您了解RMSE:(均方根误差)、MSE:(均方误差)RMD(均方根偏差)和RMS:(均方根),那么要求一个库来为您计算这个是不必要的过度工程. 所有这些指标都是一行最多 2 英寸长的 Python 代码。三个度量标准 rmse、mse、rmd 和 rms 在概念上是相同的。

RMSE answers the question: "How similar, on average, are the numbers in list1to list2?". The two lists must be the same size. I want to "wash out the noise between any two given elements, wash out the size of the data collected, and get a single number feel for change over time".

RMSE回答了这个问题:“何其相似,平均而言,是数字在list1list2?”。两个列表的大小必须相同。我想“清除任何两个给定元素之间的噪音,清除收集到的数据的大小,并获得随时间变化的单一数字”。

Intuition and ELI5 for RMSE:

RMSE 的直觉和 ELI5:

Imagine you are learning to throw darts at a dart board. Every day you practice for one hour. You want to figure out if you are getting better or getting worse. So every day you make 10 throws and measure the distance between the bullseye and where your dart hit.

想象一下,您正在学习向飞镖板投掷飞镖。每天练习一小时。你想弄清楚你是在变好还是在变坏。所以每天你投掷 10 次并测量靶心和飞镖击中位置之间的距离。

You make a list of those numbers list1. Use the root mean squared error between the distances at day 1 and a list2containing all zeros. Do the same on the 2nd and nth days. What you will get is a single number that hopefully decreases over time. When your RMSE number is zero, you hit bullseyes every time. If the rmse number goes up, you are getting worse.

你列出这些数字list1。使用第 1 天的距离与list2包含全零的a 之间的均方根误差。在第 2 天和第 n 天做同样的事情。你会得到一个单一的数字,它有望随着时间的推移而减少。当您的 RMSE 数为零时,您每次都会遇到靶心。如果 rmse 数字上升,你会变得更糟。

Example in calculating root mean squared error in python:

在python中计算均方根误差的示例:

import numpy as np
d = [0.000, 0.166, 0.333]   #ideal target distances, these can be all zeros.
p = [0.000, 0.254, 0.998]   #your performance goes here

print("d is: " + str(["%.8f" % elem for elem in d]))
print("p is: " + str(["%.8f" % elem for elem in p]))

def rmse(predictions, targets):
    return np.sqrt(((predictions - targets) ** 2).mean())

rmse_val = rmse(np.array(d), np.array(p))
print("rms error is: " + str(rmse_val))

Which prints:

哪个打印:

d is: ['0.00000000', '0.16600000', '0.33300000']
p is: ['0.00000000', '0.25400000', '0.99800000']
rms error between lists d and p is: 0.387284994115

The mathematical notation:

数学符号:

root mean squared deviation explained

root mean squared deviation explained

Glyph Legend:nis a whole positive integer representing the number of throws. irepresents a whole positive integer counter that enumerates sum. dstands for the ideal distances, the list2containing all zeros in above example. pstands for performance, the list1in the above example. superscript 2 stands for numeric squared. diis the i'th index of d. piis the i'th index of p.

字形图例:n是表示投掷次数的整数正整数。 i表示枚举 sum 的整个正整数计数器。 d代表理想距离,list2在上面的例子中包含全零。 p代表性能,list1在上面的例子中。上标 2 代表数字平方。 d i是 的第 i 个索引dp i是 的第 i 个索引p

The rmse done in small steps so it can be understood:

rmse 以小步骤完成,因此可以理解:

def rmse(predictions, targets):

    differences = predictions - targets                       #the DIFFERENCEs.

    differences_squared = differences ** 2                    #the SQUAREs of ^

    mean_of_differences_squared = differences_squared.mean()  #the MEAN of ^

    rmse_val = np.sqrt(mean_of_differences_squared)           #ROOT of ^

    return rmse_val                                           #get the ^

How does every step of RMSE work:

RMSE 的每一步是如何工作的:

Subtracting one number from another gives you the distance between them.

从另一个数字中减去一个数字可以得出它们之间的距离。

8 - 5 = 3         #absolute distance between 8 and 5 is +3
-20 - 10 = -30    #absolute distance between -20 and 10 is +30

If you multiply any number times itself, the result is always positive because negative times negative is positive:

如果将任何数字乘以自身,结果总是正数,因为负数乘以负数是正数:

3*3     = 9   = positive
-30*-30 = 900 = positive

Add them all up, but wait, then an array with many elements would have a larger error than a small array, so average them by the number of elements.

将它们全部加起来,但是等等,然后一个有很多元素的数组会比一个小数组有更大的误差,所以用元素数来平均它们。

But wait, we squared them all earlier to force them positive. Undo the damage with a square root!

但是等等,我们早些时候将它们全部平方以迫使它们成为积极的。用平方根消除伤害!

That leaves you with a single number that represents, on average, the distance between every value of list1 to it's corresponding element value of list2.

这给您留下了一个数字,平均而言,它代表 list1 的每个值与其对应的 list2 元素值之间的距离。

If the RMSE value goes down over time we are happy because varianceis decreasing.

如果 RMSE 值随着时间的推移而下降,我们很高兴,因为方差正在下降。

RMSE isn't the most accurate line fitting strategy, total least squares is:

RMSE 不是最准确的线拟合策略,总最小二乘法是:

Root mean squared error measures the vertical distance between the point and the line, so if your data is shaped like a banana, flat near the bottom and steep near the top, then the RMSE will report greater distances to points high, but short distances to points low when in fact the distances are equivalent. This causes a skew where the line prefers to be closer to points high than low.

均方根误差测量点和线之间的垂直距离,因此如果您的数据形状像香蕉,底部附近平坦,顶部附近陡峭,那么 RMSE 将报告到高点的距离更大,但到点的距离很短点低,而实际上距离相等。这会导致线条更接近高点而不是低点的偏斜。

If this is a problem the total least squares method fixes this: https://mubaris.com/posts/linear-regression

如果这是一个问题,总最小二乘法可以解决这个问题:https: //mubaris.com/posts/linear-regression

Gotchas that can break this RMSE function:

可以破坏这个 RMSE 函数的问题:

If there are nulls or infinity in either input list, then output rmse value is is going to not make sense. There are three strategies to deal with nulls / missing values / infinities in either list: Ignore that component, zero it out or add a best guess or a uniform random noise to all timesteps. Each remedy has its pros and cons depending on what your data means. In general ignoring any component with a missing value is preferred, but this biases the RMSE toward zero making you think performance has improved when it really hasn't. Adding random noise on a best guess could be preferred if there are lots of missing values.

如果任一输入列表中存在空值或无穷大,则输出 rmse 值将没有意义。有三种策略可以处理任一列表中的空值/缺失值/无穷大:忽略该组件、将其归零或向所有时间步长添加最佳猜测或均匀随机噪声。每种补救措施都有其优缺点,具体取决于您的数据含义。通常,忽略任何具有缺失值的组件是首选,但这会使 RMSE 偏向零,使您认为性能确实有所提高,而实际上并没有。如果存在大量缺失值,则最好在最佳猜测上添加随机噪声。

In order to guarantee relative correctness of the RMSE output, you must eliminate all nulls/infinites from the input.

为了保证 RMSE 输出的相对正确性,您必须从输入中消除所有空值/无穷大。

RMSE has zero tolerance for outlier data points which don't belong

RMSE 对不属于的离群数据点零容忍

Root mean squared error squares relies on all data being right and all are counted as equal. That means one stray point that's way out in left field is going to totally ruin the whole calculation. To handle outlier data points and dismiss their tremendous influence after a certain threshold, see Robust estimators that build in a threshold for dismissal of outliers.

均方根误差平方依赖于所有数据都是正确的,并且所有数据都被视为相等。这意味着左场的一个偏离点将完全破坏整个计算。要处理离群数据点并在达到某个阈值后消除它们的巨大影响,请参阅建立消除离群值阈值的稳健估计器。

回答by dataista

Just in case someone finds this thread in 2019, there is a library called ml_metricswhich is available without pre-installation in Kaggle's kernels, pretty lightweighted and accessible through pypi( it can be installed easily and fast with pip install ml_metrics):

以防万一有人在 2019 年发现这个线程,有一个名为的库ml_metrics,无需预先安装在 Kaggle 的内核中即可使用,非常轻巧且可通过以下方式访问pypi(它可以通过 轻松快速地安装pip install ml_metrics):

from ml_metrics import rmse
rmse(actual=[0, 1, 2], predicted=[1, 10, 5])
# 5.507570547286102

It has few other interesting metrics which are not available in sklearn, like mapk.

它有一些其他有趣的指标,这是不提供sklearnmapk

References:

参考:

回答by Georges

Here's an example code that calculates the RMSE between two polygon file formats PLY. It uses both the ml_metricslib and the np.linalg.norm:

这是计算两种多边形文件格式之间的 RMSE 的示例代码PLY。它同时使用ml_metricslib 和np.linalg.norm

import sys
import SimpleITK as sitk
from pyntcloud import PyntCloud as pc
import numpy as np
from ml_metrics import rmse

if len(sys.argv) < 3 or sys.argv[1] == "-h" or sys.argv[1] == "--help":
    print("Usage: compute-rmse.py <input1.ply> <input2.ply>")
    sys.exit(1)

def verify_rmse(a, b):
    n = len(a)
    return np.linalg.norm(np.array(b) - np.array(a)) / np.sqrt(n)

def compare(a, b):
    m = pc.from_file(a).points
    n = pc.from_file(b).points
    m = [ tuple(m.x), tuple(m.y), tuple(m.z) ]; m = m[0]
    n = [ tuple(n.x), tuple(n.y), tuple(n.z) ]; n = n[0]
    v1, v2 = verify_rmse(m, n), rmse(m,n)
    print(v1, v2)

compare(sys.argv[1], sys.argv[2])

回答by KeyMaker00

Or by simply using only NumPy functions:

或者仅使用 NumPy 函数:

def rmse(y, y_pred):
    return np.sqrt(np.mean(np.square(y - y_pred)))

Where:

在哪里:

  • y is my target
  • y_pred is my prediction
  • 你是我的目标
  • y_pred 是我的预测

Note that rmse(y, y_pred)==rmse(y_pred, y)due to the square function.

请注意,rmse(y, y_pred)==rmse(y_pred, y)由于平方函数。

回答by Usman Zafar

  1. No, there is a library Scikit Learn for machine learningand it can be easily employed by using Python language. It has the a function for Mean Squared Error which i am sharing the link below:
  1. 不,有一个用于机器学习库 Scikit Learn,它可以通过使用 Python 语言轻松使用。它具有均方误差的函数,我正在分享以下链接:

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html

  1. The function is named mean_squared_error as given below, where y_true would be real class values for the data tuples and y_pred would be the predicted values, predicted by the machine learning algorithm you are using:
  1. 该函数名为 mean_squared_error,如下所示,其中 y_true 是数据元组的真实类值,y_pred 是预测值,由您使用的机器学习算法预测:

mean_squared_error(y_true, y_pred)

mean_squared_error(y_true, y_pred)

  1. You have to modify it to get RMSE (by using sqrt function using Python).This process is described in this link: https://www.codeastar.com/regression-model-rmsd/
  1. 您必须修改它才能获得 RMSE(通过使用 Python 使用 sqrt 函数)。此链接中描述了此过程:https: //www.codeastar.com/regression-model-rmsd/

So, final code would be something like:

因此,最终代码将类似于:

from sklearn.metrics import mean_squared_error from math import sqrt

从 sklearn.metrics 导入 mean_squared_error 从 math 导入 sqrt

RMSD = sqrt(mean_squared_error(testing_y, prediction))

RMSD = sqrt(mean_squared_error(testing_y,预测))

print(RMSD)

打印(RMSD)

回答by jeffhale

In scikit-learn 0.22.0 you can pass mean_squared_error()the argument squared=Falseto return the RMSE.

在 scikit-learn 0.22.0 中,您可以传递mean_squared_error()参数squared=False以返回 RMSE。

from sklearn.metrics import mean_squared_error

mean_squared_error(y_actual, y_predicted, squared=False)

回答by user12999612

You can't find RMSE function directly in SKLearn. But , instead of manually doing sqrt , there is another standard way using sklearn. Apparently, Sklearn's mean_squared_error itself contains a parameter called as "squared" with default value as true .If we set it to false ,the same function will return RMSE instead of MSE.

在 SKLearn 中无法直接找到 RMSE 函数。但是,除了手动执行 sqrt 之外,还有另一种使用 sklearn 的标准方法。显然,Sklearn 的 mean_squared_error 本身包含一个名为“squared”的参数,默认值为 true 。如果我们将其设置为 false ,相同的函数将返回 RMSE 而不是 MSE。

# code changes implemented by Esha Prakash
from sklearn.metrics import mean_squared_error
rmse = mean_squared_error(y_true, y_pred , squared=False)