Python CS231n:如何计算 Softmax 损失函数的梯度?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41663874/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
CS231n: How to calculate gradient for Softmax loss function?
提问by Nghia Tran
I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax loss function using numpy
.
我正在观看斯坦福 CS231:用于视觉识别的卷积神经网络的一些视频,但不太了解如何使用numpy
.
From this stackexchangeanswer, softmax gradient is calculated as:
从这个 stackexchange答案中,softmax 梯度计算如下:
Python implementation for above is:
上面的 Python 实现是:
num_classes = W.shape[0]
num_train = X.shape[1]
for i in range(num_train):
for j in range(num_classes):
p = np.exp(f_i[j])/sum_i
dW[j, :] += (p-(j == y[i])) * X[:, i]
Could anyone explain how the above snippet work? Detailed implementation for softmax is also included below.
谁能解释一下上面的代码片段是如何工作的?下面还包括 softmax 的详细实现。
def softmax_loss_naive(W, X, y, reg):
"""
Softmax loss function, naive implementation (with loops)
Inputs:
- W: C x D array of weights
- X: D x N array of data. Data are D-dimensional columns
- y: 1-dimensional array of length N with labels 0...K-1, for K classes
- reg: (float) regularization strength
Returns:
a tuple of:
- loss as single float
- gradient with respect to weights W, an array of same size as W
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
#############################################################################
# Compute the softmax loss and its gradient using explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability. Don't forget the #
# regularization! #
#############################################################################
# Get shapes
num_classes = W.shape[0]
num_train = X.shape[1]
for i in range(num_train):
# Compute vector of scores
f_i = W.dot(X[:, i]) # in R^{num_classes}
# Normalization trick to avoid numerical instability, per http://cs231n.github.io/linear-classify/#softmax
log_c = np.max(f_i)
f_i -= log_c
# Compute loss (and add to it, divided later)
# L_i = - f(x_i)_{y_i} + log \sum_j e^{f(x_i)_j}
sum_i = 0.0
for f_i_j in f_i:
sum_i += np.exp(f_i_j)
loss += -f_i[y[i]] + np.log(sum_i)
# Compute gradient
# dw_j = 1/num_train * \sum_i[x_i * (p(y_i = j)-Ind{y_i = j} )]
# Here we are computing the contribution to the inner sum for a given i.
for j in range(num_classes):
p = np.exp(f_i[j])/sum_i
dW[j, :] += (p-(j == y[i])) * X[:, i]
# Compute average
loss /= num_train
dW /= num_train
# Regularization
loss += 0.5 * reg * np.sum(W * W)
dW += reg*W
return loss, dW
采纳答案by Ben Barsdell
Not sure if this helps, but:
不确定这是否有帮助,但是:
is really the indicator function
, as described here. This forms the expression
(j == y[i])
in the code.
是真正的指示器功能
,如所描述这里。这形成
(j == y[i])
了代码中的表达式。
Also, the gradient of the loss with respect to the weights is:
此外,损失相对于权重的梯度是:
where
在哪里
which is the origin of the X[:,i]
in the code.
这是X[:,i]
代码中的起源。
回答by Jawher.B
I know this is late but here's my answer:
我知道这已经晚了,但这是我的答案:
I'm assuming you are familiar with the cs231n Softmax loss function.
We know that:
我假设您熟悉 cs231n Softmax 损失函数。我们知道:
So just as we did with the SVM loss function the gradients are as follows:
所以就像我们对 SVM 损失函数所做的一样,梯度如下:
Hope that helped.
希望有所帮助。