Python numpy : 计算 softmax 函数的导数

Question

提问by Sam Hammamy

I am trying to understand backpropagationin a simple 3 layered neural network with MNIST.

我试图backpropagation在一个简单的 3 层神经网络中理解MNIST.

There is the input layer with weightsand a bias. The labels are MNISTso it's a 10class vector.

输入层带有weights和 a bias。标签是MNIST一个10类向量。

The second layer is a linear tranform. The third layer is the softmax activationto get the output as probabilities.

第二层是一个linear tranform。第三层是softmax activation将输出作为概率。

Backpropagationcalculates the derivative at each step and call this the gradient.

Backpropagation计算每一步的导数并将其称为梯度。

Previous layers appends the globalor previousgradient to the local gradient. I am having trouble calculating the local gradientof the softmax

前一层将global或previous渐变附加到local gradient. 我无法计算local gradient的softmax

Several resources online go through the explanation of the softmax and its derivatives and even give code samples of the softmax itself

网上的一些资源对 softmax 及其衍生物进行了解释，甚至给出了 softmax 本身的代码示例

def softmax(x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

The derivative is explained with respect to when i = jand when i != j. This is a simple code snippet I've come up with and was hoping to verify my understanding:

关于何时i = j和何时解释导数i != j。这是我想出的一个简单的代码片段，并希望验证我的理解：

def softmax(self, x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

def forward(self):
    # self.input is a vector of length 10
    # and is the output of 
    # (w * x) + b
    self.value = self.softmax(self.input)

def backward(self):
    for i in range(len(self.value)):
        for j in range(len(self.input)):
            if i == j:
                self.gradient[i] = self.value[i] * (1-self.input[i))
            else: 
                 self.gradient[i] = -self.value[i]*self.input[j]

Then self.gradientis the local gradientwhich is a vector. Is this correct? Is there a better way to write this?

然后self.gradient是local gradient一个向量。这样对吗？有没有更好的方法来写这个？

Answer 1

回答by Wasi Ahmad

I am assuming you have a 3-layer NN with W1, b1for is associated with the linear transformation from input layer to hidden layer and W2, b2is associated with linear transformation from hidden layer to output layer. Z1and Z2are the input vector to the hidden layer and output layer. a1and a2represents the output of the hidden layer and output layer. a2is your predicted output. delta3and delta2are the errors (backpropagated) and you can see the gradients of the loss function with respect to model parameters.

我假设你有一个3层NN与W1，b1用于与从输入层到隐藏层中的线性变换相关联，并且W2，b2与从隐藏层到输出层的线性变换相关联。Z1并且Z2是输入矢量到隐含层和输出层。a1并a2表示隐藏层和输出层的输出。a2是您的预测输出。delta3和delta2是误差（反向传播），您可以看到损失函数相对于模型参数的梯度。

This is a general scenario for a 3-layer NN (input layer, only one hidden layer and one output layer). You can follow the procedure described above to compute gradients which should be easy to compute! Since another answer to this post already pointed to the problem in your code, i am not repeating the same.

这是 3 层 NN（输入层，只有一个隐藏层和一个输出层）的一般场景。您可以按照上述过程计算梯度，这应该很容易计算！由于这篇文章的另一个答案已经指出了您代码中的问题，因此我不再重复相同的内容。

Answer 2

回答by Julien

As I said, you have n^2partial derivatives.

正如我所说，你有n^2偏导数。

If you do the math, you find that dSM[i]/dx[k]is SM[i] * (dx[i]/dx[k] - SM[i])so you should have:

如果你做数学题，你会发现dSM[i]/dx[k]是SM[i] * (dx[i]/dx[k] - SM[i])这样，你应该有：

if i == j:
    self.gradient[i,j] = self.value[i] * (1-self.value[i])
else: 
    self.gradient[i,j] = -self.value[i] * self.value[j]

instead of

代替

if i == j:
    self.gradient[i] = self.value[i] * (1-self.input[i])
else: 
     self.gradient[i] = -self.value[i]*self.input[j]

By the way, this may be computed more concisely like so (vectorized):

顺便说一句，这可以像这样更简洁地计算（矢量化）：

SM = self.value.reshape((-1,1))
jac = np.diagflat(self.value) - np.dot(SM, SM.T)

Answer 3

回答by Haesun Park

np.expis not stable because it has Inf. So you should subtract maximum in x.

np.exp不稳定，因为它有 Inf。所以你应该减去最大值x。

def softmax(x):
    """Compute the softmax of vector x."""
    exps = np.exp(x - x.max())
    return exps / np.sum(exps)

If xis matrix, please check the softmax function in this notebook.

如果x是矩阵，请检查本笔记本中的 softmax 函数。

Python numpy : 计算 softmax 函数的导数

提问by Sam Hammamy

回答by Wasi Ahmad

回答by Julien

回答by Haesun Park

相关推荐

最近更新

标签

Python numpy : 计算 softmax 函数的导数

提问by Sam Hammamy

回答by Wasi Ahmad

回答by Julien

回答by Haesun Park

相关推荐

Python pip install pygraphviz：找不到包“libcgraph”

Python 在 conda 中找不到安装包

如何使用 Python BeautifulSoup 将输出写入 html 文件

Python 如何在 Google App Engine 上防止“ImportError: No module named oauth2client.client”？

相关推荐

最近更新

标签