Python 什么是 logits、softmax 和 softmax_cross_entropy_with_logits?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34240703/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:41:01  来源:igfitidea点击:

What is logits, softmax and softmax_cross_entropy_with_logits?

pythonmachine-learningtensorflow

提问by Shubhashis

I was going through the tensorflow API docs here. In the tensorflow documentation, they used a keyword called logits. What is it? In a lot of methods in the API docs it is written like

我在这里浏览 tensorflow API 文档。在 tensorflow 文档中,他们使用了一个名为logits. 它是什么?在 API 文档中的许多方法中,它是这样写的

tf.nn.softmax(logits, name=None)

If what is written is those logitsare only Tensors, why keeping a different name like logits?

如果写的是什么是那些logitsTensors,为什么保持一个不同的名称,如logits

Another thing is that there are two methods I could not differentiate. They were

另一件事是我无法区分两种方法。他们是

tf.nn.softmax(logits, name=None)
tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)

What are the differences between them? The docs are not clear to me. I know what tf.nn.softmaxdoes. But not the other. An example will be really helpful.

它们之间有什么区别?文档对我来说不清楚。我知道有什么tf.nn.softmax作用。但不是另一个。一个例子将非常有帮助。

采纳答案by dga

Logits simply means that the function operates on the unscaled output of earlier layers and that the relative scale to understand the units is linear. It means, in particular, the sum of the inputs may not equal 1, that the values are notprobabilities (you might have an input of 5).

Logits 只是意味着该函数对早期层的未缩放输出进行操作,并且理解单位的相对比例是线性的。这尤其意味着输入的总和可能不等于 1,即这些值不是概率(您的输入可能是 5)。

tf.nn.softmaxproduces just the result of applying the softmax functionto an input tensor. The softmax "squishes" the inputs so that sum(input) = 1: it's a way of normalizing. The shape of output of a softmax is the same as the input: it just normalizes the values. The outputs of softmax canbe interpreted as probabilities.

tf.nn.softmax仅产生将softmax 函数应用于输入张量的结果。softmax“挤压”输入,以便sum(input) = 1:这是一种标准化的方式。softmax 的输出形状与输入相同:它只是对值进行归一化。softmax 的输出可以解释为概率。

a = tf.constant(np.array([[.1, .3, .5, .9]]))
print s.run(tf.nn.softmax(a))
[[ 0.16838508  0.205666    0.25120102  0.37474789]]

In contrast, tf.nn.softmax_cross_entropy_with_logitscomputes the cross entropy of the result after applying the softmax function (but it does it all together in a more mathematically careful way). It's similar to the result of:

相比之下,tf.nn.softmax_cross_entropy_with_logits在应用 softmax 函数后计算结果的交叉熵(但它以更数学上更谨慎的方式一起完成)。它类似于以下结果:

sm = tf.nn.softmax(x)
ce = cross_entropy(sm)

The cross entropy is a summary metric: it sums across the elements. The output of tf.nn.softmax_cross_entropy_with_logitson a shape [2,5]tensor is of shape [2,1](the first dimension is treated as the batch).

交叉熵是一个汇总度量:它对元素求和。tf.nn.softmax_cross_entropy_with_logits形状[2,5]张量上的输出具有形状[2,1](第一维被视为批次)。

If you want to do optimization to minimize the cross entropy ANDyou're softmaxing after your last layer, you should use tf.nn.softmax_cross_entropy_with_logitsinstead of doing it yourself, because it covers numerically unstable corner cases in the mathematically right way. Otherwise, you'll end up hacking it by adding little epsilons here and there.

如果您想进行优化以最小化交叉熵并且您在最后一层之后进行 softmaxing,您应该使用tf.nn.softmax_cross_entropy_with_logits而不是自己做,因为它以数学上正确的方式涵盖了数值不稳定的极端情况。否则,你最终会通过在这里和那里添加小 epsilon 来破解它。

Edited 2016-02-07:If you have single-class labels, where an object can only belong to one class, you might now consider using tf.nn.sparse_softmax_cross_entropy_with_logitsso that you don't have to convert your labels to a dense one-hot array. This function was added after release 0.6.0.

2016-02-07 编辑:如果您有单类标签,其中一个对象只能属于一个类,您现在可以考虑使用,tf.nn.sparse_softmax_cross_entropy_with_logits这样您就不必将标签转换为密集的单热数组。此功能是在 0.6.0 版本后添加的。

回答by Ian Goodfellow

tf.nn.softmaxcomputes the forward propagation through a softmax layer. You use it during evaluationof the model when you compute the probabilities that the model outputs.

tf.nn.softmax计算通过 softmax 层的前向传播。当您计算模型输出的概率时,您可以在模型评估期间使用它。

tf.nn.softmax_cross_entropy_with_logitscomputes the cost for a softmax layer. It is only used during training.

tf.nn.softmax_cross_entropy_with_logits计算 softmax 层的成本。它仅在训练期间使用。

The logits are the unnormalized log probabilitiesoutput the model (the values output before the softmax normalization is applied to them).

logits 是模型输出的归一化对数概率(在对它们应用 softmax 归一化之前输出的值)。

回答by stackoverflowuser2010

Short version:

精简版:

Suppose you have two tensors, where y_hatcontains computed scores for each class (for example, from y = W*x +b) and y_truecontains one-hot encoded true labels.

假设您有两个张量,其中y_hat包含每个类的计算分数(例如,来自 y = W*x +b)并y_true包含单热编码的真实标签。

y_hat  = ... # Predicted label, e.g. y = tf.matmul(X, W) + b
y_true = ... # True label, one-hot encoded

If you interpret the scores in y_hatas unnormalized log probabilities, then they are logits.

如果您将分数解释为非y_hat标准化对数概率,则它们是logits

Additionally, the total cross-entropy loss computed in this manner:

此外,以这种方式计算的总交叉熵损失:

y_hat_softmax = tf.nn.softmax(y_hat)
total_loss = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), [1]))

is essentially equivalent to the total cross-entropy loss computed with the function softmax_cross_entropy_with_logits():

本质上等同于使用以下函数计算的总交叉熵损失softmax_cross_entropy_with_logits()

total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))

Long version:

长版:

In the output layer of your neural network, you will probably compute an array that contains the class scores for each of your training instances, such as from a computation y_hat = W*x + b. To serve as an example, below I've created a y_hatas a 2 x 3 array, where the rows correspond to the training instances and the columns correspond to classes. So here there are 2 training instances and 3 classes.

在神经网络的输出层,您可能会计算一个数组,其中包含每个训练实例的类分数,例如来自计算y_hat = W*x + b。作为示例,下面我创建了y_hat一个 2 x 3 数组,其中行对应于训练实例,列对应于类。所以这里有 2 个训练实例和 3 个类。

import tensorflow as tf
import numpy as np

sess = tf.Session()

# Create example y_hat.
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))
sess.run(y_hat)
# array([[ 0.5,  1.5,  0.1],
#        [ 2.2,  1.3,  1.7]])

Note that the values are not normalized (i.e. the rows don't add up to 1). In order to normalize them, we can apply the softmax function, which interprets the input as unnormalized log probabilities (aka logits) and outputs normalized linear probabilities.

请注意,这些值未标准化(即行不加起来为 1)。为了对它们进行归一化,我们可以应用 softmax 函数,该函数将输入解释为非归一化对数概率(又名logits)并输出归一化线性概率。

y_hat_softmax = tf.nn.softmax(y_hat)
sess.run(y_hat_softmax)
# array([[ 0.227863  ,  0.61939586,  0.15274114],
#        [ 0.49674623,  0.20196195,  0.30129182]])

It's important to fully understand what the softmax output is saying. Below I've shown a table that more clearly represents the output above. It can be seen that, for example, the probability of training instance 1 being "Class 2" is 0.619. The class probabilities for each training instance are normalized, so the sum of each row is 1.0.

完全理解 softmax 输出在说什么很重要。下面我展示了一个表格,它更清楚地代表了上面的输出。可以看出,例如,训练实例1为“Class 2”的概率为0.619。每个训练实例的类概率都被归一化,因此每行的总和为 1.0。

                      Pr(Class 1)  Pr(Class 2)  Pr(Class 3)
                    ,--------------------------------------
Training instance 1 | 0.227863   | 0.61939586 | 0.15274114
Training instance 2 | 0.49674623 | 0.20196195 | 0.30129182

So now we have class probabilities for each training instance, where we can take the argmax() of each row to generate a final classification. From above, we may generate that training instance 1 belongs to "Class 2" and training instance 2 belongs to "Class 1".

所以现在我们有了每个训练实例的类概率,我们可以在其中获取每一行的 argmax() 来生成最终分类。从上面,我们可以生成训练实例 1 属于“类 2”,训练实例 2 属于“类 1”。

Are these classifications correct? We need to measure against the true labels from the training set. You will need a one-hot encoded y_truearray, where again the rows are training instances and columns are classes. Below I've created an example y_trueone-hot array where the true label for training instance 1 is "Class 2" and the true label for training instance 2 is "Class 3".

这些分类正确吗?我们需要根据训练集中的真实标签进行测量。您将需要一个单热编码y_true数组,其中行是训练实例,列是类。下面我创建了y_true一个单热数组示例,其中训练实例 1 的真实标签是“Class 2”,训练实例 2 的真实标签是“Class 3”。

y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
sess.run(y_true)
# array([[ 0.,  1.,  0.],
#        [ 0.,  0.,  1.]])

Is the probability distribution in y_hat_softmaxclose to the probability distribution in y_true? We can use cross-entropy lossto measure the error.

的概率分布是否y_hat_softmax接近于 的概率分布y_true?我们可以使用交叉熵损失来衡量误差。

Formula for cross-entropy loss

交叉熵损失公式

We can compute the cross-entropy loss on a row-wise basis and see the results. Below we can see that training instance 1 has a loss of 0.479, while training instance 2 has a higher loss of 1.200. This result makes sense because in our example above, y_hat_softmaxshowed that training instance 1's highest probability was for "Class 2", which matches training instance 1 in y_true; however, the prediction for training instance 2 showed a highest probability for "Class 1", which does not match the true class "Class 3".

我们可以逐行计算交叉熵损失并查看结果。下面我们可以看到训练实例 1 的损失为 0.479,而训练实例 2 的损失更高,为 1.200。这个结果是有道理的,因为在我们上面的例子中,y_hat_softmax表明训练实例 1 的最高概率是“类 2”,它与 中的训练实例 1 匹配y_true;然而,训练实例 2 的预测显示“Class 1”的概率最高,这与真正的“Class 3”不匹配。

loss_per_instance_1 = -tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1])
sess.run(loss_per_instance_1)
# array([ 0.4790107 ,  1.19967598])

What we really want is the total loss over all the training instances. So we can compute:

我们真正想要的是所有训练实例的总损失。所以我们可以计算:

total_loss_1 = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1]))
sess.run(total_loss_1)
# 0.83934333897877944

Using softmax_cross_entropy_with_logits()

使用 softmax_cross_entropy_with_logits()

We can instead compute the total cross entropy loss using the tf.nn.softmax_cross_entropy_with_logits()function, as shown below.

我们可以使用该tf.nn.softmax_cross_entropy_with_logits()函数计算总交叉熵损失,如下所示。

loss_per_instance_2 = tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true)
sess.run(loss_per_instance_2)
# array([ 0.4790107 ,  1.19967598])

total_loss_2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
sess.run(total_loss_2)
# 0.83934333897877922

Note that total_loss_1and total_loss_2produce essentially equivalent results with some small differences in the very final digits. However, you might as well use the second approach: it takes one less line of code and accumulates less numerical error because the softmax is done for you inside of softmax_cross_entropy_with_logits().

请注意,total_loss_1total_loss_2产生基本相同的结果,但最后的数字有一些细微的差异。但是,您也可以使用第二种方法:它需要少一行代码并累积更少的数值错误,因为 softmax 在softmax_cross_entropy_with_logits().

回答by Abish

Above answers have enough description for the asked question.

以上答案对提出的问题有足够的描述。

Adding to that, Tensorflow has optimised the operation of applying the activation function then calculating cost using its own activation followed by cost functions. Hence it is a good practice to use: tf.nn.softmax_cross_entropy()over tf.nn.softmax(); tf.nn.cross_entropy()

除此之外,Tensorflow 优化了应用激活函数然后使用自己的激活和成本函数计算成本的操作。因此,它是一个很好的做法,使用:tf.nn.softmax_cross_entropy()tf.nn.softmax(); tf.nn.cross_entropy()

You can find prominent difference between them in a resource intensive model.

您可以在资源密集型模型中发现它们之间的显着差异。

回答by prosti

What ever goes to softmaxis logit, this is what J. Hinton repeats in coursera videos all the time.

最终结果softmax是 logit,这就是 J. Hinton 在 coursera 视频中一直重复的内容。

回答by Tensorflow Support

Tensorflow 2.0 Compatible Answer: The explanations of dgaand stackoverflowuser2010are very detailed about Logits and the related Functions.

Tensorflow 2.0 Compatible Answer: Logits 和相关函数的解释dgastackoverflowuser2010非常详细。

All those functions, when used in Tensorflow 1.xwill work fine, but if you migrate your code from 1.x (1.14, 1.15, etc)to 2.x (2.0, 2.1, etc..), using those functions result in error.

所有这些函数在使用时Tensorflow 1.x都可以正常工作,但是如果您将代码从 迁移1.x (1.14, 1.15, etc)2.x (2.0, 2.1, etc..),使用这些函数会导致错误。

Hence, specifying the 2.0 Compatible Calls for all the functions, we discussed above, if we migrate from 1.x to 2.x, for the benefit of the community.

因此,为所有功能指定 2.0 兼容调用,我们在上面讨论过,如果我们从 迁移1.x to 2.x,为了社区的利益。

Functions in 1.x:

1.x 中的函数

  1. tf.nn.softmax
  2. tf.nn.softmax_cross_entropy_with_logits
  3. tf.nn.sparse_softmax_cross_entropy_with_logits
  1. tf.nn.softmax
  2. tf.nn.softmax_cross_entropy_with_logits
  3. tf.nn.sparse_softmax_cross_entropy_with_logits

Respective Functions when Migrated from 1.x to 2.x:

从 1.x 迁移到 2.x 时的各个功能

  1. tf.compat.v2.nn.softmax
  2. tf.compat.v2.nn.softmax_cross_entropy_with_logits
  3. tf.compat.v2.nn.sparse_softmax_cross_entropy_with_logits
  1. tf.compat.v2.nn.softmax
  2. tf.compat.v2.nn.softmax_cross_entropy_with_logits
  3. tf.compat.v2.nn.sparse_softmax_cross_entropy_with_logits

For more information about migration from 1.x to 2.x, please refer this Migration Guide.

有关从 1.x 迁移到 2.x 的更多信息,请参阅此迁移指南

回答by vipin bansal

One more thing that I would definitely like to highlight as logit is just a raw output, generally the output of last layer. This can be a negative value as well. If we use it as it's for "cross entropy" evaluation as mentioned below:

我绝对想强调的另一件事是 logit 只是一个原始输出,通常是最后一层的输出。这也可以是负值。如果我们将其用于“交叉熵”评估,如下所述:

-tf.reduce_sum(y_true * tf.log(logits))

then it wont work. As log of -ve is not defined. So using o softmax activation, will overcome this problem.

那么它就行不通了。由于未定义 -ve 的日志。所以使用 o softmax 激活,将克服这个问题。

This is my understanding, please correct me if Im wrong.

这是我的理解,如有不对请指正。