Python numpy : 计算 softmax 函数的导数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40575841/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:41:01  来源:igfitidea点击:

numpy : calculate the derivative of the softmax function

pythonnumpyneural-networkbackpropagationsoftmax

提问by Sam Hammamy

I am trying to understand backpropagationin a simple 3 layered neural network with MNIST.

我试图backpropagation在一个简单的 3 层神经网络中理解MNIST.

There is the input layer with weightsand a bias. The labels are MNISTso it's a 10class vector.

输入层带有weights和 a bias。标签是MNIST一个10类向量。

The second layer is a linear tranform. The third layer is the softmax activationto get the output as probabilities.

第二层是一个linear tranform。第三层是softmax activation将输出作为概率。

Backpropagationcalculates the derivative at each step and call this the gradient.

Backpropagation计算每一步的导数并将其称为梯度。

Previous layers appends the globalor previousgradient to the local gradient. I am having trouble calculating the local gradientof the softmax

前一层将globalprevious渐变附加到local gradient. 我无法计算local gradientsoftmax

Several resources online go through the explanation of the softmax and its derivatives and even give code samples of the softmax itself

网上的一些资源对 softmax 及其衍生物进行了解释,甚至给出了 softmax 本身的代码示例

def softmax(x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

The derivative is explained with respect to when i = jand when i != j. This is a simple code snippet I've come up with and was hoping to verify my understanding:

关于何时i = j和何时解释导数i != j。这是我想出的一个简单的代码片段,并希望验证我的理解:

def softmax(self, x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

def forward(self):
    # self.input is a vector of length 10
    # and is the output of 
    # (w * x) + b
    self.value = self.softmax(self.input)

def backward(self):
    for i in range(len(self.value)):
        for j in range(len(self.input)):
            if i == j:
                self.gradient[i] = self.value[i] * (1-self.input[i))
            else: 
                 self.gradient[i] = -self.value[i]*self.input[j]

Then self.gradientis the local gradientwhich is a vector. Is this correct? Is there a better way to write this?

然后self.gradientlocal gradient一个向量。这样对吗?有没有更好的方法来写这个?

回答by Wasi Ahmad

I am assuming you have a 3-layer NN with W1, b1for is associated with the linear transformation from input layer to hidden layer and W2, b2is associated with linear transformation from hidden layer to output layer. Z1and Z2are the input vector to the hidden layer and output layer. a1and a2represents the output of the hidden layer and output layer. a2is your predicted output. delta3and delta2are the errors (backpropagated) and you can see the gradients of the loss function with respect to model parameters.

我假设你有一个3层NN与W1b1用于与从输入层到隐藏层中的线性变换相关联,并且W2b2与从隐藏层到输出层的线性变换相关联。Z1并且Z2是输入矢量到隐含层和输出层。a1a2表示隐藏层和输出层的输出。a2是您的预测输出。delta3delta2是误差(反向传播),您可以看到损失函数相对于模型参数的梯度。

enter image description hereenter image description here

在此处输入图片说明在此处输入图片说明

This is a general scenario for a 3-layer NN (input layer, only one hidden layer and one output layer). You can follow the procedure described above to compute gradients which should be easy to compute! Since another answer to this post already pointed to the problem in your code, i am not repeating the same.

这是 3 层 NN(输入层,只有一个隐藏层和一个输出层)的一般场景。您可以按照上述过程计算梯度,这应该很容易计算!由于这篇文章的另一个答案已经指出了您代码中的问题,因此我不再重复相同的内容。

回答by Julien

As I said, you have n^2partial derivatives.

正如我所说,你有n^2偏导数。

If you do the math, you find that dSM[i]/dx[k]is SM[i] * (dx[i]/dx[k] - SM[i])so you should have:

如果你做数学题,你会发现dSM[i]/dx[k]SM[i] * (dx[i]/dx[k] - SM[i])这样,你应该有:

if i == j:
    self.gradient[i,j] = self.value[i] * (1-self.value[i])
else: 
    self.gradient[i,j] = -self.value[i] * self.value[j]

instead of

代替

if i == j:
    self.gradient[i] = self.value[i] * (1-self.input[i])
else: 
     self.gradient[i] = -self.value[i]*self.input[j]

By the way, this may be computed more concisely like so (vectorized):

顺便说一句,这可以像这样更简洁地计算(矢量化):

SM = self.value.reshape((-1,1))
jac = np.diagflat(self.value) - np.dot(SM, SM.T)

回答by Haesun Park

np.expis not stable because it has Inf. So you should subtract maximum in x.

np.exp不稳定,因为它有 Inf。所以你应该减去最大值x

def softmax(x):
    """Compute the softmax of vector x."""
    exps = np.exp(x - x.max())
    return exps / np.sum(exps)

If xis matrix, please check the softmax function in this notebook.

如果x是矩阵,请检查本笔记本中的 softmax 函数。