Python CS231n：如何计算 Softmax 损失函数的梯度？

Question

提问by Nghia Tran

I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax loss function using numpy.

我正在观看斯坦福 CS231：用于视觉识别的卷积神经网络的一些视频，但不太了解如何使用numpy.

From this stackexchangeanswer, softmax gradient is calculated as:

从这个 stackexchange答案中，softmax 梯度计算如下：

Python implementation for above is:

上面的 Python 实现是：

num_classes = W.shape[0]
num_train = X.shape[1]
for i in range(num_train):
  for j in range(num_classes):
    p = np.exp(f_i[j])/sum_i
    dW[j, :] += (p-(j == y[i])) * X[:, i]

Could anyone explain how the above snippet work? Detailed implementation for softmax is also included below.

谁能解释一下上面的代码片段是如何工作的？下面还包括 softmax 的详细实现。

def softmax_loss_naive(W, X, y, reg):
  """
  Softmax loss function, naive implementation (with loops)
  Inputs:
  - W: C x D array of weights
  - X: D x N array of data. Data are D-dimensional columns
  - y: 1-dimensional array of length N with labels 0...K-1, for K classes
  - reg: (float) regularization strength
  Returns:
  a tuple of:
  - loss as single float
  - gradient with respect to weights W, an array of same size as W
  """
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)

  #############################################################################
  # Compute the softmax loss and its gradient using explicit loops.           #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################

  # Get shapes
  num_classes = W.shape[0]
  num_train = X.shape[1]

  for i in range(num_train):
    # Compute vector of scores
    f_i = W.dot(X[:, i]) # in R^{num_classes}

    # Normalization trick to avoid numerical instability, per http://cs231n.github.io/linear-classify/#softmax
    log_c = np.max(f_i)
    f_i -= log_c

    # Compute loss (and add to it, divided later)
    # L_i = - f(x_i)_{y_i} + log \sum_j e^{f(x_i)_j}
    sum_i = 0.0
    for f_i_j in f_i:
      sum_i += np.exp(f_i_j)
    loss += -f_i[y[i]] + np.log(sum_i)

    # Compute gradient
    # dw_j = 1/num_train * \sum_i[x_i * (p(y_i = j)-Ind{y_i = j} )]
    # Here we are computing the contribution to the inner sum for a given i.
    for j in range(num_classes):
      p = np.exp(f_i[j])/sum_i
      dW[j, :] += (p-(j == y[i])) * X[:, i]

  # Compute average
  loss /= num_train
  dW /= num_train

  # Regularization
  loss += 0.5 * reg * np.sum(W * W)
  dW += reg*W

  return loss, dW

Answer 1

采纳答案by Ben Barsdell

Not sure if this helps, but:

不确定这是否有帮助，但是：

y_i is really the indicator function y_i , as described here. This forms the expression (j == y[i])in the code.

是真正的指示器功能，如所描述这里。这形成(j == y[i])了代码中的表达式。

Also, the gradient of the loss with respect to the weights is:

此外，损失相对于权重的梯度是：

y_i

where

在哪里

y_i

which is the origin of the X[:,i]in the code.

这是X[:,i]代码中的起源。

Answer 2

回答by Jawher.B

I know this is late but here's my answer:

我知道这已经晚了，但这是我的答案：

I'm assuming you are familiar with the cs231n Softmax loss function. We know that: enter image description here

我假设您熟悉 cs231n Softmax 损失函数。我们知道：在此处输入图片说明

So just as we did with the SVM loss function the gradients are as follows: enter image description here

所以就像我们对 SVM 损失函数所做的一样，梯度如下：在此处输入图片说明

Hope that helped.

希望有所帮助。

Python CS231n：如何计算 Softmax 损失函数的梯度？

提问by Nghia Tran

采纳答案by Ben Barsdell

回答by Jawher.B

相关推荐

最近更新

标签

Python CS231n：如何计算 Softmax 损失函数的梯度？

提问by Nghia Tran

采纳答案by Ben Barsdell

回答by Jawher.B

相关推荐

Python 如何在 tkinter 中定位按钮？

Python 值错误：无法插入 ID，已存在

Python Conda - 离线安装/更新

Python IndexError：使用py2exe时元组索引超出范围

相关推荐

最近更新

标签