Python Tensorflow 均方误差损失函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41338509/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:50:16  来源:igfitidea点击:

Tensorflow mean squared error loss function

pythonmachine-learningtensorflow

提问by Nitro

I have seen a few different mean squared error loss functions in various posts for regression models in Tensorflow:

我在 Tensorflow 中的回归模型的各种帖子中看到了一些不同的均方误差损失函数:

loss = tf.reduce_sum(tf.pow(prediction - Y,2))/(n_instances)
loss = tf.reduce_mean(tf.squared_difference(prediction, Y))
loss = tf.nn.l2_loss(prediction - Y)

What are the differences between these?

这些之间有什么区别?

采纳答案by Javier Martín

I would say that the third equation is different, while the 1st and 2nd are formally the same but behave differently due to numerical concerns.

我会说第三个等式是不同的,而第一个和第二个在形式上是相同的,但由于数值问题而表现不同。

I think that the 3rd equation (using l2_loss) is just returning 1/2 of the squared Euclidean norm, that is, the sum of the element-wise square of the input, which is x=prediction-Y. You are not dividing by the number of samples anywhere. Thus, if you have a very large number of samples, the computation may overflow (returning Inf).

我认为第三个方程(使用l2_loss)只是返回平方欧几里得范数的 1/2,即输入的元素平方之和,即x=prediction-Y。您没有除以任何地方的样本数。因此,如果您有大量样本,计算可能会溢出(返回 Inf)。

The other two are formally the same, computing the mean of the element-wise squared xtensor. However, while the documentation does not specify it explicitly, it is very likely that reduce_meanuses an algorithm adapted to avoid overflowing with very large number of samples. In other words, it likely does not try to sum everything first and thendivide by N, but use some kind of rolling mean that can adapt to an arbitrary number of samples without necessarily causing an overflow.

另外两个在形式上是相同的,计算元素方x张量的均值。然而,虽然文档没有明确指定它,但很可能reduce_mean使用一种算法来避免大量样本溢出。换句话说,它可能不会尝试先对所有内容求和然后除以 N,而是使用某种滚动均值来适应任意数量的样本,而不必引起溢出。

回答by Salvador Dali

The first and the second loss functions calculate the same thing, but in a slightly different way. The third function calculate something completely different. You can see this by executing this code:

第一个和第二个损失函数计算相同的东西,但方式略有不同。第三个函数计算完全不同的东西。您可以通过执行以下代码看到这一点:

import tensorflow as tf

shape_obj = (5, 5)
shape_obj = (100, 6, 12)
Y1 = tf.random_normal(shape=shape_obj)
Y2 = tf.random_normal(shape=shape_obj)

loss1 = tf.reduce_sum(tf.pow(Y1 - Y2, 2)) / (reduce(lambda x, y: x*y, shape_obj))
loss2 = tf.reduce_mean(tf.squared_difference(Y1, Y2))
loss3 = tf.nn.l2_loss(Y1 - Y2)

with tf.Session() as sess:
    print sess.run([loss1, loss2, loss3])
# when I run it I got: [2.0291963, 2.0291963, 7305.1069]

Now you can verify that 1-st and 2-nd calculates the same thing (in theory) by noticing that tf.pow(a - b, 2)is the same as tf.squared_difference(a - b, 2). Also reduce_meanis the same as reduce_sum / number_of_element. The thing is that computers can't calculate everything exactly. To see what numerical instabilities can do to your calculations take a look at this:

现在您可以通过注意到 1-st 和 2-ndtf.pow(a - b, 2)tf.squared_difference(a - b, 2). 也是reduce_mean一样的reduce_sum / number_of_element。问题是计算机无法准确计算所有内容。要了解数值不稳定性会对您的计算产生什么影响,请查看以下内容:

import tensorflow as tf

shape_obj = (5000, 5000, 10)
Y1 = tf.zeros(shape=shape_obj)
Y2 = tf.ones(shape=shape_obj)

loss1 = tf.reduce_sum(tf.pow(Y1 - Y2, 2)) / (reduce(lambda x, y: x*y, shape_obj))
loss2 = tf.reduce_mean(tf.squared_difference(Y1, Y2))

with tf.Session() as sess:
    print sess.run([loss1, loss2])

It is easy to see that the answer should be 1, but you will get something like this: [1.0, 0.26843545].

很容易看出,答案应该是1,但你会得到这样的:[1.0, 0.26843545]

Regarding your last function, the documentation says that:

关于您的最后一个功能,文档说:

Computes half the L2 norm of a tensor without the sqrt: output = sum(t ** 2) / 2

计算没有 sqrt 的张量 L2 范数的一半:output = sum(t ** 2) / 2

So if you want it to calculate the same thing (in theory) as the first one you need to scale it appropriately:

因此,如果您希望它计算与第一个相同的东西(理论上),您需要适当地缩放它:

loss3 = tf.nn.l2_loss(Y1 - Y2) * 2 / (reduce(lambda x, y: x*y, shape_obj))