Python 什么是 logits、softmax 和 softmax_cross_entropy_with_logits?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34240703/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is logits, softmax and softmax_cross_entropy_with_logits?
提问by Shubhashis
I was going through the tensorflow API docs here. In the tensorflow documentation, they used a keyword called logits
. What is it? In a lot of methods in the API docs it is written like
我在这里浏览 tensorflow API 文档。在 tensorflow 文档中,他们使用了一个名为logits
. 它是什么?在 API 文档中的许多方法中,它是这样写的
tf.nn.softmax(logits, name=None)
If what is written is those logits
are only Tensors
, why keeping a different name like logits
?
如果写的是什么是那些logits
只Tensors
,为什么保持一个不同的名称,如logits
?
Another thing is that there are two methods I could not differentiate. They were
另一件事是我无法区分两种方法。他们是
tf.nn.softmax(logits, name=None)
tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)
What are the differences between them? The docs are not clear to me. I know what tf.nn.softmax
does. But not the other. An example will be really helpful.
它们之间有什么区别?文档对我来说不清楚。我知道有什么tf.nn.softmax
作用。但不是另一个。一个例子将非常有帮助。
采纳答案by dga
Logits simply means that the function operates on the unscaled output of earlier layers and that the relative scale to understand the units is linear. It means, in particular, the sum of the inputs may not equal 1, that the values are notprobabilities (you might have an input of 5).
Logits 只是意味着该函数对早期层的未缩放输出进行操作,并且理解单位的相对比例是线性的。这尤其意味着输入的总和可能不等于 1,即这些值不是概率(您的输入可能是 5)。
tf.nn.softmax
produces just the result of applying the softmax functionto an input tensor. The softmax "squishes" the inputs so that sum(input) = 1
: it's a way of normalizing. The shape of output of a softmax is the same as the input: it just normalizes the values. The outputs of softmax canbe interpreted as probabilities.
tf.nn.softmax
仅产生将softmax 函数应用于输入张量的结果。softmax“挤压”输入,以便sum(input) = 1
:这是一种标准化的方式。softmax 的输出形状与输入相同:它只是对值进行归一化。softmax 的输出可以解释为概率。
a = tf.constant(np.array([[.1, .3, .5, .9]]))
print s.run(tf.nn.softmax(a))
[[ 0.16838508 0.205666 0.25120102 0.37474789]]
In contrast, tf.nn.softmax_cross_entropy_with_logits
computes the cross entropy of the result after applying the softmax function (but it does it all together in a more mathematically careful way). It's similar to the result of:
相比之下,tf.nn.softmax_cross_entropy_with_logits
在应用 softmax 函数后计算结果的交叉熵(但它以更数学上更谨慎的方式一起完成)。它类似于以下结果:
sm = tf.nn.softmax(x)
ce = cross_entropy(sm)
The cross entropy is a summary metric: it sums across the elements. The output of tf.nn.softmax_cross_entropy_with_logits
on a shape [2,5]
tensor is of shape [2,1]
(the first dimension is treated as the batch).
交叉熵是一个汇总度量:它对元素求和。tf.nn.softmax_cross_entropy_with_logits
形状[2,5]
张量上的输出具有形状[2,1]
(第一维被视为批次)。
If you want to do optimization to minimize the cross entropy ANDyou're softmaxing after your last layer, you should use tf.nn.softmax_cross_entropy_with_logits
instead of doing it yourself, because it covers numerically unstable corner cases in the mathematically right way. Otherwise, you'll end up hacking it by adding little epsilons here and there.
如果您想进行优化以最小化交叉熵并且您在最后一层之后进行 softmaxing,您应该使用tf.nn.softmax_cross_entropy_with_logits
而不是自己做,因为它以数学上正确的方式涵盖了数值不稳定的极端情况。否则,你最终会通过在这里和那里添加小 epsilon 来破解它。
Edited 2016-02-07:If you have single-class labels, where an object can only belong to one class, you might now consider using tf.nn.sparse_softmax_cross_entropy_with_logits
so that you don't have to convert your labels to a dense one-hot array. This function was added after release 0.6.0.
2016-02-07 编辑:如果您有单类标签,其中一个对象只能属于一个类,您现在可以考虑使用,tf.nn.sparse_softmax_cross_entropy_with_logits
这样您就不必将标签转换为密集的单热数组。此功能是在 0.6.0 版本后添加的。
回答by Ian Goodfellow
tf.nn.softmax
computes the forward propagation through a softmax layer. You use it during evaluationof the model when you compute the probabilities that the model outputs.
tf.nn.softmax
计算通过 softmax 层的前向传播。当您计算模型输出的概率时,您可以在模型评估期间使用它。
tf.nn.softmax_cross_entropy_with_logits
computes the cost for a softmax layer. It is only used during training.
tf.nn.softmax_cross_entropy_with_logits
计算 softmax 层的成本。它仅在训练期间使用。
The logits are the unnormalized log probabilitiesoutput the model (the values output before the softmax normalization is applied to them).
logits 是模型输出的非归一化对数概率(在对它们应用 softmax 归一化之前输出的值)。
回答by stackoverflowuser2010
Short version:
精简版:
Suppose you have two tensors, where y_hat
contains computed scores for each class (for example, from y = W*x +b) and y_true
contains one-hot encoded true labels.
假设您有两个张量,其中y_hat
包含每个类的计算分数(例如,来自 y = W*x +b)并y_true
包含单热编码的真实标签。
y_hat = ... # Predicted label, e.g. y = tf.matmul(X, W) + b
y_true = ... # True label, one-hot encoded
If you interpret the scores in y_hat
as unnormalized log probabilities, then they are logits.
如果您将分数解释为非y_hat
标准化对数概率,则它们是logits。
Additionally, the total cross-entropy loss computed in this manner:
此外,以这种方式计算的总交叉熵损失:
y_hat_softmax = tf.nn.softmax(y_hat)
total_loss = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), [1]))
is essentially equivalent to the total cross-entropy loss computed with the function softmax_cross_entropy_with_logits()
:
本质上等同于使用以下函数计算的总交叉熵损失softmax_cross_entropy_with_logits()
:
total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
Long version:
长版:
In the output layer of your neural network, you will probably compute an array that contains the class scores for each of your training instances, such as from a computation y_hat = W*x + b
. To serve as an example, below I've created a y_hat
as a 2 x 3 array, where the rows correspond to the training instances and the columns correspond to classes. So here there are 2 training instances and 3 classes.
在神经网络的输出层,您可能会计算一个数组,其中包含每个训练实例的类分数,例如来自计算y_hat = W*x + b
。作为示例,下面我创建了y_hat
一个 2 x 3 数组,其中行对应于训练实例,列对应于类。所以这里有 2 个训练实例和 3 个类。
import tensorflow as tf
import numpy as np
sess = tf.Session()
# Create example y_hat.
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))
sess.run(y_hat)
# array([[ 0.5, 1.5, 0.1],
# [ 2.2, 1.3, 1.7]])
Note that the values are not normalized (i.e. the rows don't add up to 1). In order to normalize them, we can apply the softmax function, which interprets the input as unnormalized log probabilities (aka logits) and outputs normalized linear probabilities.
请注意,这些值未标准化(即行不加起来为 1)。为了对它们进行归一化,我们可以应用 softmax 函数,该函数将输入解释为非归一化对数概率(又名logits)并输出归一化线性概率。
y_hat_softmax = tf.nn.softmax(y_hat)
sess.run(y_hat_softmax)
# array([[ 0.227863 , 0.61939586, 0.15274114],
# [ 0.49674623, 0.20196195, 0.30129182]])
It's important to fully understand what the softmax output is saying. Below I've shown a table that more clearly represents the output above. It can be seen that, for example, the probability of training instance 1 being "Class 2" is 0.619. The class probabilities for each training instance are normalized, so the sum of each row is 1.0.
完全理解 softmax 输出在说什么很重要。下面我展示了一个表格,它更清楚地代表了上面的输出。可以看出,例如,训练实例1为“Class 2”的概率为0.619。每个训练实例的类概率都被归一化,因此每行的总和为 1.0。
Pr(Class 1) Pr(Class 2) Pr(Class 3)
,--------------------------------------
Training instance 1 | 0.227863 | 0.61939586 | 0.15274114
Training instance 2 | 0.49674623 | 0.20196195 | 0.30129182
So now we have class probabilities for each training instance, where we can take the argmax() of each row to generate a final classification. From above, we may generate that training instance 1 belongs to "Class 2" and training instance 2 belongs to "Class 1".
所以现在我们有了每个训练实例的类概率,我们可以在其中获取每一行的 argmax() 来生成最终分类。从上面,我们可以生成训练实例 1 属于“类 2”,训练实例 2 属于“类 1”。
Are these classifications correct? We need to measure against the true labels from the training set. You will need a one-hot encoded y_true
array, where again the rows are training instances and columns are classes. Below I've created an example y_true
one-hot array where the true label for training instance 1 is "Class 2" and the true label for training instance 2 is "Class 3".
这些分类正确吗?我们需要根据训练集中的真实标签进行测量。您将需要一个单热编码y_true
数组,其中行是训练实例,列是类。下面我创建了y_true
一个单热数组示例,其中训练实例 1 的真实标签是“Class 2”,训练实例 2 的真实标签是“Class 3”。
y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
sess.run(y_true)
# array([[ 0., 1., 0.],
# [ 0., 0., 1.]])
Is the probability distribution in y_hat_softmax
close to the probability distribution in y_true
? We can use cross-entropy lossto measure the error.
的概率分布是否y_hat_softmax
接近于 的概率分布y_true
?我们可以使用交叉熵损失来衡量误差。
We can compute the cross-entropy loss on a row-wise basis and see the results. Below we can see that training instance 1 has a loss of 0.479, while training instance 2 has a higher loss of 1.200. This result makes sense because in our example above, y_hat_softmax
showed that training instance 1's highest probability was for "Class 2", which matches training instance 1 in y_true
; however, the prediction for training instance 2 showed a highest probability for "Class 1", which does not match the true class "Class 3".
我们可以逐行计算交叉熵损失并查看结果。下面我们可以看到训练实例 1 的损失为 0.479,而训练实例 2 的损失更高,为 1.200。这个结果是有道理的,因为在我们上面的例子中,y_hat_softmax
表明训练实例 1 的最高概率是“类 2”,它与 中的训练实例 1 匹配y_true
;然而,训练实例 2 的预测显示“Class 1”的概率最高,这与真正的“Class 3”不匹配。
loss_per_instance_1 = -tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1])
sess.run(loss_per_instance_1)
# array([ 0.4790107 , 1.19967598])
What we really want is the total loss over all the training instances. So we can compute:
我们真正想要的是所有训练实例的总损失。所以我们可以计算:
total_loss_1 = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1]))
sess.run(total_loss_1)
# 0.83934333897877944
Using softmax_cross_entropy_with_logits()
使用 softmax_cross_entropy_with_logits()
We can instead compute the total cross entropy loss using the tf.nn.softmax_cross_entropy_with_logits()
function, as shown below.
我们可以使用该tf.nn.softmax_cross_entropy_with_logits()
函数计算总交叉熵损失,如下所示。
loss_per_instance_2 = tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true)
sess.run(loss_per_instance_2)
# array([ 0.4790107 , 1.19967598])
total_loss_2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
sess.run(total_loss_2)
# 0.83934333897877922
Note that total_loss_1
and total_loss_2
produce essentially equivalent results with some small differences in the very final digits. However, you might as well use the second approach: it takes one less line of code and accumulates less numerical error because the softmax is done for you inside of softmax_cross_entropy_with_logits()
.
请注意,total_loss_1
和total_loss_2
产生基本相同的结果,但最后的数字有一些细微的差异。但是,您也可以使用第二种方法:它需要少一行代码并累积更少的数值错误,因为 softmax 在softmax_cross_entropy_with_logits()
.
回答by Abish
Above answers have enough description for the asked question.
以上答案对提出的问题有足够的描述。
Adding to that, Tensorflow has optimised the operation of applying the activation function then calculating cost using its own activation followed by cost functions. Hence it is a good practice to use: tf.nn.softmax_cross_entropy()
over tf.nn.softmax(); tf.nn.cross_entropy()
除此之外,Tensorflow 优化了应用激活函数然后使用自己的激活和成本函数计算成本的操作。因此,它是一个很好的做法,使用:tf.nn.softmax_cross_entropy()
在tf.nn.softmax(); tf.nn.cross_entropy()
You can find prominent difference between them in a resource intensive model.
您可以在资源密集型模型中发现它们之间的显着差异。
回答by prosti
What ever goes to softmax
is logit, this is what J. Hinton repeats in coursera videos all the time.
最终结果softmax
是 logit,这就是 J. Hinton 在 coursera 视频中一直重复的内容。
回答by Tensorflow Support
Tensorflow 2.0 Compatible Answer: The explanations of dga
and stackoverflowuser2010
are very detailed about Logits and the related Functions.
Tensorflow 2.0 Compatible Answer: Logits 和相关函数的解释dga
和stackoverflowuser2010
非常详细。
All those functions, when used in Tensorflow 1.x
will work fine, but if you migrate your code from 1.x (1.14, 1.15, etc)
to 2.x (2.0, 2.1, etc..)
, using those functions result in error.
所有这些函数在使用时Tensorflow 1.x
都可以正常工作,但是如果您将代码从 迁移1.x (1.14, 1.15, etc)
到2.x (2.0, 2.1, etc..)
,使用这些函数会导致错误。
Hence, specifying the 2.0 Compatible Calls for all the functions, we discussed above, if we migrate from 1.x to 2.x
, for the benefit of the community.
因此,为所有功能指定 2.0 兼容调用,我们在上面讨论过,如果我们从 迁移1.x to 2.x
,为了社区的利益。
Functions in 1.x:
1.x 中的函数:
tf.nn.softmax
tf.nn.softmax_cross_entropy_with_logits
tf.nn.sparse_softmax_cross_entropy_with_logits
tf.nn.softmax
tf.nn.softmax_cross_entropy_with_logits
tf.nn.sparse_softmax_cross_entropy_with_logits
Respective Functions when Migrated from 1.x to 2.x:
从 1.x 迁移到 2.x 时的各个功能:
tf.compat.v2.nn.softmax
tf.compat.v2.nn.softmax_cross_entropy_with_logits
tf.compat.v2.nn.sparse_softmax_cross_entropy_with_logits
tf.compat.v2.nn.softmax
tf.compat.v2.nn.softmax_cross_entropy_with_logits
tf.compat.v2.nn.sparse_softmax_cross_entropy_with_logits
For more information about migration from 1.x to 2.x, please refer this Migration Guide.
有关从 1.x 迁移到 2.x 的更多信息,请参阅此迁移指南。
回答by vipin bansal
One more thing that I would definitely like to highlight as logit is just a raw output, generally the output of last layer. This can be a negative value as well. If we use it as it's for "cross entropy" evaluation as mentioned below:
我绝对想强调的另一件事是 logit 只是一个原始输出,通常是最后一层的输出。这也可以是负值。如果我们将其用于“交叉熵”评估,如下所述:
-tf.reduce_sum(y_true * tf.log(logits))
then it wont work. As log of -ve is not defined. So using o softmax activation, will overcome this problem.
那么它就行不通了。由于未定义 -ve 的日志。所以使用 o softmax 激活,将克服这个问题。
This is my understanding, please correct me if Im wrong.
这是我的理解,如有不对请指正。