如何在Python中实现Softmax函数

Question

提问by alvas

From the Udacity's deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector:

从Udacity 的深度学习类来看，y_i 的 softmax 就是指数除以整个 Y 向量的指数之和：

Where S(y_i)is the softmax function of y_iand eis the exponential and jis the no. of columns in the input vector Y.

哪里S(y_i)是的SOFTMAX功能y_i，并e为指数和j是否定的。输入向量 Y 中的列数。

I've tried the following:

我尝试了以下方法：

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))

which returns:

返回：

[ 0.8360188   0.11314284  0.05083836]

But the suggested solution was:

但建议的解决方案是：

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

which produces the same output as the first implementation, even though the first implementation explicitly takes the difference of each column and the max and then divides by the sum.

它产生与第一个实现相同的输出，即使第一个实现显式地采用每列和最大值的差，然后除以总和。

Can someone show mathematically why? Is one correct and the other one wrong?

有人可以用数学方法说明原因吗？一个是正确的，另一个是错误的？

Are the implementation similar in terms of code and time complexity? Which is more efficient?

实现在代码和时间复杂度方面是否相似？哪个更有效率？

Answer 1

采纳答案by Trevor Merrifield

They're both correct, but yours is preferred from the point of view of numerical stability.

它们都是正确的，但从数值稳定性的角度来看，您的首选。

You start with

你开始

e ^ (x - max(x)) / sum(e^(x - max(x))

By using the fact that a^(b - c) = (a^b)/(a^c) we have

通过使用 a^(b - c) = (a^b)/(a^c) 我们有

= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))

= e ^ x / sum(e ^ x)

Which is what the other answer says. You could replace max(x) with any variable and it would cancel out.

这是另一个答案所说的。你可以用任何变量替换 max(x) ，它会抵消。

Answer 2

回答by Shagun Sodhani

I would say that while both are correct mathematically, implementation-wise, first one is better. When computing softmax, the intermediate values may become very large. Dividing two large numbers can be numerically unstable. These notes(from Stanford) mention a normalization trick which is essentially what you are doing.

我会说，虽然两者在数学上都是正确的，但在实现方面，第一个更好。在计算 softmax 时，中间值可能会变得非常大。将两个大数相除可能在数值上不稳定。这些笔记（来自斯坦福大学）提到了一个归一化技巧，这基本上就是你正在做的。

Answer 3

回答by Sadegh Salehi

Hereyou can find out why they used - max.

在这里您可以了解他们为什么使用- max.

From there:

从那里：

"When you're writing code for computing the Softmax function in practice, the intermediate terms may be very large due to the exponentials. Dividing large numbers can be numerically unstable, so it is important to use a normalization trick."

“当你在实践中编写用于计算 Softmax 函数的代码时，由于指数，中间项可能非常大。除以大数可能在数值上不稳定，因此使用归一化技巧很重要。”

Answer 4

回答by desertnaut

(Well... much confusion here, both in the question and in the answers...)

（嗯......这里有很多混乱，无论是在问题还是在答案......）

To start with, the two solutions (i.e. yours and the suggested one) are notequivalent; they happento be equivalent only for the special case of 1-D score arrays. You would have discovered it if you had tried also the 2-D score array in the Udacity quiz provided example.

首先，这两种解决方案（即您的解决方案和建议的解决方案）并不等效；它们碰巧仅对一维分数数组的特殊情况等效。如果您也尝试过 Udacity 测验提供的示例中的 2-D 分数数组，您就会发现它。

Results-wise, the only actual difference between the two solutions is the axis=0argument. To see that this is the case, let's try your solution (your_softmax) and one where the only difference is the axisargument:

结果方面，两种解决方案之间唯一的实际区别是axis=0参数。要了解情况是否如此，让我们尝试您的解决方案 ( your_softmax) 以及唯一不同之处在于axis参数的解决方案：

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# correct solution:
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

As I said, for a 1-D score array, the results are indeed identical:

正如我所说，对于一维分数数组，结果确实相同：

scores = [3.0, 1.0, 0.2]
print(your_softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
print(softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
your_softmax(scores) == softmax(scores)
# array([ True,  True,  True], dtype=bool)

Nevertheless, here are the results for the 2-D score array given in the Udacity quiz as a test example:

尽管如此，以下是 Udacity 测验中给出的二维分数数组的结果作为测试示例：

scores2D = np.array([[1, 2, 3, 6],
                     [2, 4, 5, 6],
                     [3, 8, 7, 6]])

print(your_softmax(scores2D))
# [[  4.89907947e-04   1.33170787e-03   3.61995731e-03   7.27087861e-02]
#  [  1.33170787e-03   9.84006416e-03   2.67480676e-02   7.27087861e-02]
#  [  3.61995731e-03   5.37249300e-01   1.97642972e-01   7.27087861e-02]]

print(softmax(scores2D))
# [[ 0.09003057  0.00242826  0.01587624  0.33333333]
#  [ 0.24472847  0.01794253  0.11731043  0.33333333]
#  [ 0.66524096  0.97962921  0.86681333  0.33333333]]

The results are different - the second one is indeed identical with the one expected in the Udacity quiz, where all columns indeed sum to 1, which is not the case with the first (wrong) result.

结果是不同的 - 第二个确实与 Udacity 测验中预期的相同，其中所有列的总和确实为 1，而第一个（错误的）结果并非如此。

So, all the fuss was actually for an implementation detail - the axisargument. According to the numpy.sum documentation:

所以，所有的大惊小怪实际上都是为了一个实现细节——axis参数。根据numpy.sum 文档：

The default, axis=None, will sum all of the elements of the input array

默认轴 = 无，将对输入数组的所有元素求和

while here we want to sum row-wise, hence axis=0. For a 1-D array, the sum of the (only) row and the sum of all the elements happen to be identical, hence your identical results in that case...

而在这里我们想按行求和，因此axis=0。对于一维数组，（仅）行的总和与所有元素的总和恰好相同，因此在这种情况下您的结果相同......

The axisissue aside, your implementation (i.e. your choice to subtract the max first) is actually betterthan the suggested solution! In fact, it is the recommended way of implementing the softmax function - see herefor the justification (numeric stability, also pointed out by some other answers here).

该axis问题之外，您的实现（即你的选择减去最大第一）实际上是更好的比建议的解决方案！事实上，这是实现 softmax 函数的推荐方式 - 请参阅此处的理由（数值稳定性，此处也由其他一些答案指出）。

Answer 5

回答by Pimin Konstantin Kefaloukos

A more concise version is:

更简洁的版本是：

def softmax(x):
    return np.exp(x) / np.exp(x).sum(axis=0)

Answer 6

回答by ChuckFive

So, this is really a comment to desertnaut's answer but I can't comment on it yet due to my reputation. As he pointed out, your version is only correct if your input consists of a single sample. If your input consists of several samples, it is wrong. However, desertnaut's solution is also wrong.The problem is that once he takes a 1-dimensional input and then he takes a 2-dimensional input. Let me show this to you.

所以，这确实是对desertnaut 答案的评论，但由于我的声誉，我还不能对此发表评论。正如他所指出的，只有当您的输入由单个样本组成时，您的版本才是正确的。如果您的输入包含多个样本，则是错误的。然而，desertnaut 的解决方案也是错误的。问题是，一旦他接受一维输入，然后他接受二维输入。让我向你展示这个。

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# desertnaut solution (copied from his answer): 
def desertnaut_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

# my (correct) solution:
def softmax(z):
    assert len(z.shape) == 2
    s = np.max(z, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(z - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

Lets take desertnauts example:

让我们以沙漠航员为例：

x1 = np.array([[1, 2, 3, 6]]) # notice that we put the data into 2 dimensions(!)

This is the output:

这是输出：

your_softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

desertnaut_softmax(x1)
array([[ 1.,  1.,  1.,  1.]])

softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

You can see that desernauts version would fail in this situation. (It would not if the input was just one dimensional like np.array([1, 2, 3, 6]).

您可以看到 desernauts 版本在这种情况下会失败。（如果输入只是像 np.array([1, 2, 3, 6]) 这样的一维，则不会。

Lets now use 3 samples since thats the reason why we use a 2 dimensional input. The following x2 is not the same as the one from desernauts example.

现在让我们使用 3 个样本，因为这就是我们使用二维输入的原因。以下 x2 与 desernauts 示例中的 x2 不同。

x2 = np.array([[1, 2, 3, 6],  # sample 1
               [2, 4, 5, 6],  # sample 2
               [1, 2, 3, 6]]) # sample 1 again(!)

This input consists of a batch with 3 samples. But sample one and three are essentially the same. We now expect 3 rows of softmax activations where the first should be the same as the third and also the same as our activation of x1!

此输入包含一个批次，其中包含 3 个样本。但是样本一和样本三本质上是一样的。我们现在期望 3 行 softmax 激活，其中第一个应该与第三个相同，也与我们对 x1 的激活相同！

your_softmax(x2)
array([[ 0.00183535,  0.00498899,  0.01356148,  0.27238963],
       [ 0.00498899,  0.03686393,  0.10020655,  0.27238963],
       [ 0.00183535,  0.00498899,  0.01356148,  0.27238963]])


desertnaut_softmax(x2)
array([[ 0.21194156,  0.10650698,  0.10650698,  0.33333333],
       [ 0.57611688,  0.78698604,  0.78698604,  0.33333333],
       [ 0.21194156,  0.10650698,  0.10650698,  0.33333333]])

softmax(x2)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047],
       [ 0.01203764,  0.08894682,  0.24178252,  0.65723302],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

I hope you can see that this is only the case with my solution.

我希望你能看到这只是我的解决方案的情况。

softmax(x1) == softmax(x2)[0]
array([[ True,  True,  True,  True]], dtype=bool)

softmax(x1) == softmax(x2)[2]
array([[ True,  True,  True,  True]], dtype=bool)

Additionally, here is the results of TensorFlows softmax implementation:

此外，这是 TensorFlows softmax 实现的结果：

import tensorflow as tf
import numpy as np
batch = np.asarray([[1,2,3,6],[2,4,5,6],[1,2,3,6]])
x = tf.placeholder(tf.float32, shape=[None, 4])
y = tf.nn.softmax(x)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(y, feed_dict={x: batch})

And the result:

结果：

array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037045],
       [ 0.01203764,  0.08894681,  0.24178252,  0.657233  ],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037045]], dtype=float32)

Answer 7

回答by Rahul Ahuja

In order to maintain for numerical stability, max(x) should be subtracted. The following is the code for softmax function;

为了保持数值稳定性，应减去 max(x)。下面是softmax函数的代码；

def softmax(x):

if len(x.shape) > 1:
    tmp = np.max(x, axis = 1)
    x -= tmp.reshape((x.shape[0], 1))
    x = np.exp(x)
    tmp = np.sum(x, axis = 1)
    x /= tmp.reshape((x.shape[0], 1))
else:
    tmp = np.max(x)
    x -= tmp
    x = np.exp(x)
    tmp = np.sum(x)
    x /= tmp


return x

Answer 8

回答by Salvador Dali

From mathematical point of view both sides are equal.

从数学的角度来看，双方是平等的。

And you can easily prove this. Let's m=max(x). Now your function softmaxreturns a vector, whose i-th coordinate is equal to

你可以很容易地证明这一点。让我们m=max(x)。现在您的函数softmax返回一个向量，其第 i 个坐标等于

notice that this works for any m, because for all (even complex) numbers e^m != 0

请注意，这适用于 any m，因为对于所有（甚至是复数）数字e^m != 0

from computational complexity point of view they are also equivalent and both run in O(n)time, where nis the size of a vector.
from numerical stabilitypoint of view, the first solution is preferred, because e^xgrows very fast and even for pretty small values of xit will overflow. Subtracting the maximum value allows to get rid of this overflow. To practically experience the stuff I was talking about try to feed x = np.array([1000, 5])into both of your functions. One will return correct probability, the second will overflow with nan
your solution works only for vectors (Udacity quiz wants you to calculate it for matrices as well). In order to fix it you need to use sum(axis=0)

从计算复杂性的角度来看，它们也是等效的，并且都O(n)及时运行，其中n是向量的大小。
从数值稳定性的角度来看，第一个解决方案是首选，因为e^x增长非常快，即使是非常小的值x也会溢出。减去最大值可以消除这种溢出。要实际体验我正在谈论的东西，请尝试将x = np.array([1000, 5])它们输入到您的两个函数中。一个将返回正确的概率，第二个将溢出nan
您的解决方案仅适用于向量（Udacity 测验希望您也为矩阵计算它）。为了修复它，你需要使用sum(axis=0)

Answer 9

回答by Nolan Conaway

EDIT. As of version 1.2.0, scipy includes softmax as a special function:

编辑。从 1.2.0 版本开始，scipy 包含 softmax 作为一个特殊功能：

https://scipy.github.io/devdocs/generated/scipy.special.softmax.html

I wrote a function applying the softmax over any axis:

我写了一个在任何轴上应用 softmax 的函数：

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats. 
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the 
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter, 
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p

Subtracting the max, as other users described, is good practice. I wrote a detailed post about it here.

正如其他用户所描述的那样，减去最大值是一种很好的做法。我在这里写了一篇关于它的详细帖子。

Answer 10

回答by Hao Xu

I would like to supplement a little bit more understanding of the problem. Here it is correct of subtracting max of the array. But if you run the code in the other post, you would find it is not giving you right answer when the array is 2D or higher dimensions.

我想补充一点对问题的理解。这里减去数组的最大值是正确的。但是，如果您运行另一篇文章中的代码，您会发现当数组为 2D 或更高维度时，它并没有给您正确的答案。

Here I give you some suggestions:

在这里我给你一些建议：

To get max, try to do it along x-axis, you will get an 1D array.
Reshape your max array to original shape.
Do np.exp get exponential value.
Do np.sum along axis.
Get the final results.

要获得最大值，请尝试沿 x 轴进行操作，您将获得一个一维数组。
将您的最大数组重塑为原始形状。
做 np.exp 得到指数值。
沿轴做 np.sum。
得到最终结果。

Follow the result you will get the correct answer by doing vectorization. Since it is related to the college homework, I cannot post the exact code here, but I would like to give more suggestions if you don't understand.

按照结果，您将通过进行矢量化得到正确的答案。由于涉及到大学作业，这里不能贴出具体的代码，如果有不明白的可以多提建议。

如何在Python中实现Softmax函数

提问by alvas

采纳答案by Trevor Merrifield

回答by Shagun Sodhani

回答by Sadegh Salehi

回答by desertnaut

回答by Pimin Konstantin Kefaloukos

回答by ChuckFive

回答by Rahul Ahuja

回答by Salvador Dali

回答by Nolan Conaway

回答by Hao Xu

相关推荐

最近更新

标签

如何在Python中实现Softmax函数

提问by alvas

采纳答案by Trevor Merrifield

回答by Shagun Sodhani

回答by Sadegh Salehi

回答by desertnaut

回答by Pimin Konstantin Kefaloukos

回答by ChuckFive

回答by Rahul Ahuja

回答by Salvador Dali

回答by Nolan Conaway

回答by Hao Xu

相关推荐

Python 在 TensorFlow 中，tf.identity 是做什么用的？

Python Pandas 将带有 unix 时间戳（以毫秒为单位）的行转换为日期时间

python多重继承使用super将参数传递给构造函数

Python 如何对 Pandas 数据框的选定列进行 Pearson 相关

相关推荐

最近更新

标签