Python 如何在 TensorFlow 中选择交叉熵损失?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47034888/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to choose cross-entropy loss in TensorFlow?
提问by Maxim
Classification problems, such as logistic regression or multinomial logistic regression, optimize a cross-entropyloss. Normally, the cross-entropy layer follows the softmaxlayer, which produces probability distribution.
分类问题,例如逻辑回归或多项逻辑回归,可以优化交叉熵损失。通常情况下,交叉熵层跟在softmax层之后,产生概率分布。
In tensorflow, there are at least a dozen of different cross-entropy loss functions:
在 tensorflow 中,至少有十几种不同的交叉熵损失函数:
tf.losses.softmax_cross_entropy
tf.losses.sparse_softmax_cross_entropy
tf.losses.sigmoid_cross_entropy
tf.contrib.losses.softmax_cross_entropy
tf.contrib.losses.sigmoid_cross_entropy
tf.nn.softmax_cross_entropy_with_logits
tf.nn.sigmoid_cross_entropy_with_logits
- ...
tf.losses.softmax_cross_entropy
tf.losses.sparse_softmax_cross_entropy
tf.losses.sigmoid_cross_entropy
tf.contrib.losses.softmax_cross_entropy
tf.contrib.losses.sigmoid_cross_entropy
tf.nn.softmax_cross_entropy_with_logits
tf.nn.sigmoid_cross_entropy_with_logits
- ...
Which one works only for binary classification and which are suitable for multi-class problems? When should you use sigmoid
instead of softmax
? How are sparse
functions different from others and why is it only softmax
?
哪一个只适用于二元分类,哪一个适用于多类问题?什么时候应该使用sigmoid
而不是softmax
?如何在sparse
功能与别人不同,为什么仅是它softmax
?
Related (more math-oriented) discussion: What are the differences between all these cross-entropy losses in Keras and TensorFlow?.
相关(更面向数学)讨论:Keras 和 TensorFlow 中所有这些交叉熵损失之间有什么区别?.
回答by Maxim
Preliminary facts
初步事实
In functional sense, the sigmoid is a partial case of the softmax function, when the number of classes equals 2. Both of them do the same operation: transform the logits (see below) to probabilities.
In simple binary classification, there's no big difference between the two, however in case of multinomial classification, sigmoid allows to deal with non-exclusive labels (a.k.a. multi-labels), while softmax deals with exclusive classes (see below).
A logit(also called a score) is a raw unscaled value associated with a class, before computing the probability. In terms of neural network architecture, this means that a logit is an output of a dense (fully-connected) layer.
Tensorflow naming is a bit strange: all of the functions below accept logits, not probabilities, and apply the transformation themselves (which is simply more efficient).
在函数意义上,sigmoid 是 softmax 函数 的部分情况,当类数等于 2 时。它们都执行相同的操作:将 logits(见下文)转换为概率。
在简单的二元分类中,两者之间没有太大区别,但是在多项分类的情况下,sigmoid 允许处理非排他标签(又名multi-labels),而 softmax 处理排他类(见下文)。
甲分对数(也称为得分)是一个与一个类别相关联的原始未缩放的值计算所述概率之前,。就神经网络架构而言,这意味着 logit 是密集(全连接)层的输出。
Tensorflow 命名有点奇怪:下面的所有函数都接受 logits,而不是 probabilities,并自己应用转换(这只是更有效)。
Sigmoid functions family
Sigmoid 函数族
tf.nn.sigmoid_cross_entropy_with_logits
tf.nn.weighted_cross_entropy_with_logits
tf.losses.sigmoid_cross_entropy
tf.contrib.losses.sigmoid_cross_entropy
(DEPRECATED)
tf.nn.sigmoid_cross_entropy_with_logits
tf.nn.weighted_cross_entropy_with_logits
tf.losses.sigmoid_cross_entropy
tf.contrib.losses.sigmoid_cross_entropy
(已弃用)
As stated earlier, sigmoid
loss function is for binary classification.
But tensorflow functions are more general and allow to do
multi-label classification, when the classes are independent.
In other words, tf.nn.sigmoid_cross_entropy_with_logits
solves N
binary classifications at once.
如前所述,sigmoid
损失函数用于二元分类。但是当类别独立时,张量流函数更通用并且允许进行多标签分类。换句话说,
一次tf.nn.sigmoid_cross_entropy_with_logits
解决N
二元分类。
The labels must be one-hot encoded or can contain soft class probabilities.
标签必须是单热编码或可以包含软类概率。
tf.losses.sigmoid_cross_entropy
in addition allows to set the in-batch weights,
i.e. make some examples more important than others.
tf.nn.weighted_cross_entropy_with_logits
allows to set class weights(remember, the classification is binary), i.e. make positive errors larger than
negative errors. This is useful when the training data is unbalanced.
tf.losses.sigmoid_cross_entropy
此外允许设置批内权重,即使某些示例比其他示例更重要。
tf.nn.weighted_cross_entropy_with_logits
允许设置类权重(记住,分类是二进制的),即使正误差大于负误差。这在训练数据不平衡时很有用。
Softmax functions family
Softmax 函数族
tf.nn.softmax_cross_entropy_with_logits
(DEPRECATED IN 1.5)tf.nn.softmax_cross_entropy_with_logits_v2
tf.losses.softmax_cross_entropy
tf.contrib.losses.softmax_cross_entropy
(DEPRECATED)
tf.nn.softmax_cross_entropy_with_logits
(在 1.5 中已弃用)tf.nn.softmax_cross_entropy_with_logits_v2
tf.losses.softmax_cross_entropy
tf.contrib.losses.softmax_cross_entropy
(已弃用)
These loss functions should be used for multinomial mutually exclusive classification,
i.e. pick one out of N
classes. Also applicable when N = 2
.
这些损失函数应该用于多项互斥分类,即从N
类中挑选一个。时也适用N = 2
。
The labels must be one-hot encoded or can contain soft class probabilities: a particular example can belong to class A with 50% probability and class B with 50% probability. Note that strictly speaking it doesn't mean that it belongs to both classes, but one can interpret the probabilities this way.
标签必须是单热编码或可以包含软类概率:特定示例可以 50% 的概率属于 A 类,50% 的概率属于 B 类。请注意,严格来说,这并不意味着它属于两个类别,但可以这样解释概率。
Just like in sigmoid
family, tf.losses.softmax_cross_entropy
allows
to set the in-batch weights, i.e. make some examples more important than others.
As far as I know, as of tensorflow 1.3, there's no built-in way to set class weights.
就像在sigmoid
家庭中一样,tf.losses.softmax_cross_entropy
允许设置批量权重,即使某些示例比其他示例更重要。据我所知,从 tensorflow 1.3 开始,没有设置类权重的内置方法。
[UPD]In tensorflow 1.5, v2
version was introducedand the original softmax_cross_entropy_with_logits
loss got deprecated. The only difference between them is that in a newer version, backpropagation happens into both logits and labels (here's a discussionwhy this may be useful).
[UPD]在 tensorflow 1.5 中,引入了v2
版本并且不推荐使用原始损失。它们之间的唯一区别是,在较新的版本中,反向传播发生在 logits 和标签中(这是为什么这可能有用的讨论)。softmax_cross_entropy_with_logits
Sparse functions family
稀疏函数族
tf.nn.sparse_softmax_cross_entropy_with_logits
tf.losses.sparse_softmax_cross_entropy
tf.contrib.losses.sparse_softmax_cross_entropy
(DEPRECATED)
tf.nn.sparse_softmax_cross_entropy_with_logits
tf.losses.sparse_softmax_cross_entropy
tf.contrib.losses.sparse_softmax_cross_entropy
(已弃用)
Like ordinary softmax
above, these loss functions should be used for
multinomial mutually exclusive classification, i.e. pick one out of N
classes.
The difference is in labels encoding: the classes are specified as integers (class index),
not one-hot vectors. Obviously, this doesn't allow soft classes, but it
can save some memory when there are thousands or millions of classes.
However, note that logits
argument must still contain logits per each class,
thus it consumes at least [batch_size, classes]
memory.
和softmax
上面的普通一样,这些损失函数应该用于多项互斥分类,即从N
类中挑选一个。区别在于标签编码:类被指定为整数(类索引),而不是单热向量。显然,这不允许软类,但是当有数千或数百万个类时,它可以节省一些内存。但是,请注意,logits
参数仍然必须包含每个类的 logits,因此它至少消耗[batch_size, classes]
内存。
Like above, tf.losses
version has a weights
argument which allows
to set the in-batch weights.
像上面一样,tf.losses
version 有一个weights
参数可以设置批量权重。
Sampled softmax functions family
采样的 softmax 函数族
These functions provide another alternative for dealing with huge number of classes. Instead of computing and comparing an exact probability distribution, they compute a loss estimate from a random sample.
这些函数为处理大量类提供了另一种选择。他们不是计算和比较精确的概率分布,而是从随机样本中计算损失估计。
The arguments weights
and biases
specify a separate fully-connected layer that
is used to compute the logits for a chosen sample.
参数weights
和biases
指定一个单独的全连接层,用于计算所选样本的对数。
Like above, labels
are not one-hot encoded, but have the shape [batch_size, num_true]
.
像上面一样,labels
不是单热编码,而是具有形状[batch_size, num_true]
。
Sampled functions are only suitable for training. In test time, it's recommended to
use a standard softmax
loss (either sparse or one-hot) to get an actual distribution.
采样函数仅适用于训练。在测试时,建议使用标准softmax
损失(稀疏或单热)来获得实际分布。
Another alternative loss is tf.nn.nce_loss
, which performs noise-contrastive estimation(if you're interested, see this very detailed discussion). I've included this function to the softmax family, because NCE guarantees approximation to softmax in the limit.
另一种替代损失是tf.nn.nce_loss
,它执行噪声对比估计(如果您有兴趣,请参阅此非常详细的讨论)。我已将此函数包含在 softmax 系列中,因为 NCE 保证在极限内逼近 softmax。
回答by Hamidreza
However, for version 1.5, softmax_cross_entropy_with_logits_v2
must be used instead, while using its argument with the argument key=...
, for example
但是,对于 1.5 版,softmax_cross_entropy_with_logits_v2
必须改用,同时将其参数与 一起使用argument key=...
,例如
softmax_cross_entropy_with_logits_v2(_sentinel=None, labels=y,
logits=my_prediction, dim=-1, name=None)