Python 如何在 TensorFlow 中选择交叉熵损失?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47034888/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:59:05  来源:igfitidea点击:

How to choose cross-entropy loss in TensorFlow?

pythontensorflowneural-networklogistic-regressioncross-entropy

提问by Maxim

Classification problems, such as logistic regression or multinomial logistic regression, optimize a cross-entropyloss. Normally, the cross-entropy layer follows the softmaxlayer, which produces probability distribution.

分类问题,例如逻辑回归或多项逻辑回归,可以优化交叉熵损失。通常情况下,交叉熵层跟在softmax层之后,产生概率分布。

In tensorflow, there are at least a dozen of different cross-entropy loss functions:

在 tensorflow 中,至少有十几种不同的交叉熵损失函数

  • tf.losses.softmax_cross_entropy
  • tf.losses.sparse_softmax_cross_entropy
  • tf.losses.sigmoid_cross_entropy
  • tf.contrib.losses.softmax_cross_entropy
  • tf.contrib.losses.sigmoid_cross_entropy
  • tf.nn.softmax_cross_entropy_with_logits
  • tf.nn.sigmoid_cross_entropy_with_logits
  • ...
  • tf.losses.softmax_cross_entropy
  • tf.losses.sparse_softmax_cross_entropy
  • tf.losses.sigmoid_cross_entropy
  • tf.contrib.losses.softmax_cross_entropy
  • tf.contrib.losses.sigmoid_cross_entropy
  • tf.nn.softmax_cross_entropy_with_logits
  • tf.nn.sigmoid_cross_entropy_with_logits
  • ...

Which one works only for binary classification and which are suitable for multi-class problems? When should you use sigmoidinstead of softmax? How are sparsefunctions different from others and why is it only softmax?

哪一个只适用于二元分类,哪一个适用于多类问题?什么时候应该使用sigmoid而不是softmax?如何在sparse功能与别人不同,为什么仅是它softmax

Related (more math-oriented) discussion: What are the differences between all these cross-entropy losses in Keras and TensorFlow?.

相关(更面向数学)讨论:Keras 和 TensorFlow 中所有这些交叉熵损失之间有什么区别?.

回答by Maxim

Preliminary facts

初步事实

  • In functional sense, the sigmoid is a partial case of the softmax function, when the number of classes equals 2. Both of them do the same operation: transform the logits (see below) to probabilities.

    In simple binary classification, there's no big difference between the two, however in case of multinomial classification, sigmoid allows to deal with non-exclusive labels (a.k.a. multi-labels), while softmax deals with exclusive classes (see below).

  • A logit(also called a score) is a raw unscaled value associated with a class, before computing the probability. In terms of neural network architecture, this means that a logit is an output of a dense (fully-connected) layer.

    Tensorflow naming is a bit strange: all of the functions below accept logits, not probabilities, and apply the transformation themselves (which is simply more efficient).

  • 在函数意义上,sigmoid 是 softmax 函数 的部分情况,当类数等于 2 时。它们都执行相同的操作:将 logits(见下文)转换为概率。

    在简单的二元分类中,两者之间没有太大区别,但是在多项分类的情况下,sigmoid 允许处理非排他标签(又名multi-labels),而 softmax 处理排他类(见下文)。

  • 分对数(也称为得分)是一个与一个类别相关联的原始未缩放的值计算所述概率之前,。就神经网络架构而言,这意味着 logit 是密集(全连接)层的输出。

    Tensorflow 命名有点奇怪:下面的所有函数都接受 logits,而不是 probabilities,并自己应用转换(这只是更有效)。

Sigmoid functions family

Sigmoid 函数族

As stated earlier, sigmoidloss function is for binary classification. But tensorflow functions are more general and allow to do multi-label classification, when the classes are independent. In other words, tf.nn.sigmoid_cross_entropy_with_logitssolves Nbinary classifications at once.

如前所述,sigmoid损失函数用于二元分类。但是当类别独立时,张量流函数更通用并且允许进行多标签分类。换句话说, 一次tf.nn.sigmoid_cross_entropy_with_logits解决N二元分类。

The labels must be one-hot encoded or can contain soft class probabilities.

标签必须是单热编码或可以包含软类概率。

tf.losses.sigmoid_cross_entropyin addition allows to set the in-batch weights, i.e. make some examples more important than others. tf.nn.weighted_cross_entropy_with_logitsallows to set class weights(remember, the classification is binary), i.e. make positive errors larger than negative errors. This is useful when the training data is unbalanced.

tf.losses.sigmoid_cross_entropy此外允许设置批内权重,即使某些示例比其他示例更重要。 tf.nn.weighted_cross_entropy_with_logits允许设置类权重(记住,分类是二进制的),即使正误差大于负误差。这在训练数据不平衡时很有用。

Softmax functions family

Softmax 函数族

These loss functions should be used for multinomial mutually exclusive classification, i.e. pick one out of Nclasses. Also applicable when N = 2.

这些损失函数应该用于多项互斥分类,即从N类中挑选一个。时也适用N = 2

The labels must be one-hot encoded or can contain soft class probabilities: a particular example can belong to class A with 50% probability and class B with 50% probability. Note that strictly speaking it doesn't mean that it belongs to both classes, but one can interpret the probabilities this way.

标签必须是单热编码或可以包含软类概率:特定示例可以 50% 的概率属于 A 类,50% 的概率属于 B 类。请注意,严格来说,这并不意味着它属于两个类别,但可以这样解释概率。

Just like in sigmoidfamily, tf.losses.softmax_cross_entropyallows to set the in-batch weights, i.e. make some examples more important than others. As far as I know, as of tensorflow 1.3, there's no built-in way to set class weights.

就像在sigmoid家庭中一样,tf.losses.softmax_cross_entropy允许设置批量权重,即使某些示例比其他示例更重要。据我所知,从 tensorflow 1.3 开始,没有设置类权重的内置方法。

[UPD]In tensorflow 1.5, v2version was introducedand the original softmax_cross_entropy_with_logitsloss got deprecated. The only difference between them is that in a newer version, backpropagation happens into both logits and labels (here's a discussionwhy this may be useful).

[UPD]在 tensorflow 1.5 中,引入v2版本并且不推荐使用原始损失。它们之间的唯一区别是,在较新的版本中,反向传播发生在 logits 和标签中(这是为什么这可能有用的讨论)。softmax_cross_entropy_with_logits

Sparse functions family

稀疏函数族

Like ordinary softmaxabove, these loss functions should be used for multinomial mutually exclusive classification, i.e. pick one out of Nclasses. The difference is in labels encoding: the classes are specified as integers (class index), not one-hot vectors. Obviously, this doesn't allow soft classes, but it can save some memory when there are thousands or millions of classes. However, note that logitsargument must still contain logits per each class, thus it consumes at least [batch_size, classes]memory.

softmax上面的普通一样,这些损失函数应该用于多项互斥分类,即从N类中挑选一个。区别在于标签编码:类被指定为整数(类索引),而不是单热向量。显然,这不允许软类,但是当有数千或数百万个类时,它可以节省一些内存。但是,请注意,logits参数仍然必须包含每个类的 logits,因此它至少消耗[batch_size, classes]内存。

Like above, tf.lossesversion has a weightsargument which allows to set the in-batch weights.

像上面一样,tf.lossesversion 有一个weights参数可以设置批量权重。

Sampled softmax functions family

采样的 softmax 函数族

These functions provide another alternative for dealing with huge number of classes. Instead of computing and comparing an exact probability distribution, they compute a loss estimate from a random sample.

这些函数为处理大量类提供了另一种选择。他们不是计算和比较精确的概率分布,而是从随机样本中计算损失估计。

The arguments weightsand biasesspecify a separate fully-connected layer that is used to compute the logits for a chosen sample.

参数weightsbiases指定一个单独的全连接层,用于计算所选样本的对数。

Like above, labelsare not one-hot encoded, but have the shape [batch_size, num_true].

像上面一样,labels不是单热编码,而是具有形状[batch_size, num_true]

Sampled functions are only suitable for training. In test time, it's recommended to use a standard softmaxloss (either sparse or one-hot) to get an actual distribution.

采样函数仅适用于训练。在测试时,建议使用标准softmax损失(稀疏或单热)来获得实际分布。

Another alternative loss is tf.nn.nce_loss, which performs noise-contrastive estimation(if you're interested, see this very detailed discussion). I've included this function to the softmax family, because NCE guarantees approximation to softmax in the limit.

另一种替代损失是tf.nn.nce_loss,它执行噪声对比估计(如果您有兴趣,请参阅此非常详细的讨论)。我已将此函数包含在 softmax 系列中,因为 NCE 保证在极限内逼近 softmax。

回答by Hamidreza

However, for version 1.5, softmax_cross_entropy_with_logits_v2must be used instead, while using its argument with the argument key=..., for example

但是,对于 1.5 版,softmax_cross_entropy_with_logits_v2必须改用,同时将其参数与 一起使用argument key=...,例如

softmax_cross_entropy_with_logits_v2(_sentinel=None, labels=y,
                                    logits=my_prediction, dim=-1, name=None)