Python CS231n:如何计算 Softmax 损失函数的梯度?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41663874/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 01:23:34  来源:igfitidea点击:

CS231n: How to calculate gradient for Softmax loss function?

pythonnumpysoftmax

提问by Nghia Tran

I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax loss function using numpy.

我正在观看斯坦福 CS231:用于视觉识别的卷积神经网络的一些视频,但不太了解如何使用numpy.

From this stackexchangeanswer, softmax gradient is calculated as:

这个 stackexchange答案中,softmax 梯度计算如下:

enter image description here

在此处输入图片说明

Python implementation for above is:

上面的 Python 实现是:

num_classes = W.shape[0]
num_train = X.shape[1]
for i in range(num_train):
  for j in range(num_classes):
    p = np.exp(f_i[j])/sum_i
    dW[j, :] += (p-(j == y[i])) * X[:, i]

Could anyone explain how the above snippet work? Detailed implementation for softmax is also included below.

谁能解释一下上面的代码片段是如何工作的?下面还包括 softmax 的详细实现。

def softmax_loss_naive(W, X, y, reg):
  """
  Softmax loss function, naive implementation (with loops)
  Inputs:
  - W: C x D array of weights
  - X: D x N array of data. Data are D-dimensional columns
  - y: 1-dimensional array of length N with labels 0...K-1, for K classes
  - reg: (float) regularization strength
  Returns:
  a tuple of:
  - loss as single float
  - gradient with respect to weights W, an array of same size as W
  """
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)

  #############################################################################
  # Compute the softmax loss and its gradient using explicit loops.           #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################

  # Get shapes
  num_classes = W.shape[0]
  num_train = X.shape[1]

  for i in range(num_train):
    # Compute vector of scores
    f_i = W.dot(X[:, i]) # in R^{num_classes}

    # Normalization trick to avoid numerical instability, per http://cs231n.github.io/linear-classify/#softmax
    log_c = np.max(f_i)
    f_i -= log_c

    # Compute loss (and add to it, divided later)
    # L_i = - f(x_i)_{y_i} + log \sum_j e^{f(x_i)_j}
    sum_i = 0.0
    for f_i_j in f_i:
      sum_i += np.exp(f_i_j)
    loss += -f_i[y[i]] + np.log(sum_i)

    # Compute gradient
    # dw_j = 1/num_train * \sum_i[x_i * (p(y_i = j)-Ind{y_i = j} )]
    # Here we are computing the contribution to the inner sum for a given i.
    for j in range(num_classes):
      p = np.exp(f_i[j])/sum_i
      dW[j, :] += (p-(j == y[i])) * X[:, i]

  # Compute average
  loss /= num_train
  dW /= num_train

  # Regularization
  loss += 0.5 * reg * np.sum(W * W)
  dW += reg*W

  return loss, dW

采纳答案by Ben Barsdell

Not sure if this helps, but:

不确定这是否有帮助,但是:

y_iis really the indicator function y_i, as described here. This forms the expression (j == y[i])in the code.

义是真正的指示器功能义,如所描述这里。这形成(j == y[i])了代码中的表达式。

Also, the gradient of the loss with respect to the weights is:

此外,损失相对于权重的梯度是:

y_i

义

where

在哪里

y_i

义

which is the origin of the X[:,i]in the code.

这是X[:,i]代码中的起源。

回答by Jawher.B

I know this is late but here's my answer:

我知道这已经晚了,但这是我的答案:

I'm assuming you are familiar with the cs231n Softmax loss function. We know that: enter image description here

我假设您熟悉 cs231n Softmax 损失函数。我们知道: 在此处输入图片说明

So just as we did with the SVM loss function the gradients are as follows: enter image description here

所以就像我们对 SVM 损失函数所做的一样,梯度如下: 在此处输入图片说明

Hope that helped.

希望有所帮助。