Python Pytorch softmax:使用什么维度?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49036993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:57:50  来源:igfitidea点击:

Pytorch softmax: What dimension to use?

pythonpytorch

提问by Jadiel de Armas

The function torch.nn.functional.softmaxtakes two parameters: inputand dim. According to its documentation, the softmax operation is applied to all slices of inputalong the specified dim, and will rescale them so that the elements lie in the range (0, 1)and sum to 1.

该函数torch.nn.functional.softmax采用两个参数:inputdim。根据其文档,softmax 操作应用于input沿指定 的所有切片,dim并将重新缩放它们,以便元素位于范围内(0, 1)并且总和为 1。

Let input be:

让输入为:

input = torch.randn((3, 4, 5, 6))

Suppose I want the following, so that every entry in that array is 1:

假设我想要以下内容,以便该数组中的每个条目都是 1:

sum = torch.sum(input, dim = 3) # sum's size is (3, 4, 5, 1)

How should I apply softmax?

我应该如何应用softmax?

softmax(input, dim = 0) # Way Number 0
softmax(input, dim = 1) # Way Number 1
softmax(input, dim = 2) # Way Number 2
softmax(input, dim = 3) # Way Number 3

My intuition tells me that is the last one, but I am not sure. English is not my first language and the use of the word alongseemed confusing to me because of that.

我的直觉告诉我这是最后一个,但我不确定。英语不是我的母语,因此这个词的使用along让我感到困惑。

I am not very clear on what "along" means, so I will use an example that could clarify things. Suppose we have a tensor of size (s1, s2, s3, s4), and I want this to happen

我不是很清楚“沿着”是什么意思,所以我将使用一个可以澄清事情的例子。假设我们有一个大小为 (s1, s2, s3, s4) 的张量,我希望这发生

采纳答案by Wasi Ahmad

The easiest way I can think of to make you understand is: say you are given a tensor of shape (s1, s2, s3, s4)and as you mentioned you want to have the sum of all the entries along the last axis to be 1.

我能想到的让您理解的最简单方法是:假设您有一个形状张量,(s1, s2, s3, s4)并且正如您提到的那样,您希望沿最后一个轴的所有条目的总和为 1。

sum = torch.sum(input, dim = 3) # input is of shape (s1, s2, s3, s4)

Then you should call the softmax as:

那么你应该将 softmax 称为:

softmax(input, dim = 3)

To understand easily, you can consider a 4d tensor of shape (s1, s2, s3, s4)as a 2d tensor or matrix of shape (s1*s2*s3, s4). Now if you want the matrix to contain values in each row (axis=0) or column (axis=1) that sum to 1, then, you can simply call the softmaxfunction on the 2d tensor as follows:

为了容易理解,您可以将形状的 4d 张量(s1, s2, s3, s4)视为 2d 张量或 shape 矩阵(s1*s2*s3, s4)。现在,如果您希望矩阵在每行 (axis=0) 或列 (axis=1) 中包含总和为 1 的值,那么您可以简单地调用softmax二维张量上的函数,如下所示:

softmax(input, dim = 0) # normalizes values along axis 0
softmax(input, dim = 1) # normalizes values along axis 1

You can see the example that Steven mentioned in his answer.

你可以看到史蒂文在他的回答中提到的例子。

回答by sww

enter image description here

在此处输入图片说明

Steven's answer above is not correct. See the snapshot below. It is actually the reverse way.

史蒂文上面的回答是不正确的。请参阅下面的快照。这实际上是相反的方式。

回答by Pinocchio

I am not 100% sure what your question means but I think your confusion is simply that you don't understand what dimparameter means. So I will explain it and provide examples.

我不是 100% 确定你的问题是什么意思,但我认为你的困惑只是你不明白dim参数的含义。所以我将解释它并提供示例。

If we have:

如果我们有:

m0 = nn.Softmax(dim=0)

what that means is that m0will normalize elements along the zeroth coordinate of the tensor it receives. Formally if given a tensor bof size say (d0,d1)then the following will be true:

这意味着m0它将沿着它接收到的张量的第零坐标对元素进行归一化。正式地,如果给定一个b大小的张量say(d0,d1)那么以下将是正确的:

sum^{d0}_{i0=1} b[i0,i1] = 1, forall i1 \in {0,...,d1}

you can easily check this with a Pytorch example:

您可以使用 Pytorch 示例轻松检查这一点:

>>> b = torch.arange(0,4,1.0).view(-1,2)
>>> b 
tensor([[0., 1.],
        [2., 3.]])
>>> m0 = nn.Softmax(dim=0) 
>>> b0 = m0(b)
>>> b0 
tensor([[0.1192, 0.1192],
        [0.8808, 0.8808]])

now since dim=0means going through i0 \in {0,1}(i.e. going through the rows) if we choose any column i1and sum its elements (i.e. the rows) then we should get 1. Check it:

现在因为dim=0意味着通过i0 \in {0,1}(即通过行),如果我们选择任何列i1并将其元素(即行)相加,那么我们应该得到 1。检查它:

>>> b0[:,0].sum()
tensor(1.0000)
>>> b0[:,1].sum()
tensor(1.0000)

as expected.

正如预期的那样。

Note we do get all rows sum to 1 by "summing out the rows" with torch.sum(b0,dim=0), check it out:

请注意,我们确实通过使用 对行进行“求和”来使所有行的总和为 1 torch.sum(b0,dim=0),请查看:

>>> torch.sum(b0,0)
tensor([1.0000, 1.0000])


We can create a more complicated example to make sure it's really clear.

我们可以创建一个更复杂的例子来确保它真的很清楚。

a = torch.arange(0,24,1.0).view(-1,3,4)
>>> a
tensor([[[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]],

        [[12., 13., 14., 15.],
         [16., 17., 18., 19.],
         [20., 21., 22., 23.]]])
>>> a0 = m0(a)
>>> a0[:,0,0].sum()
tensor(1.0000)
>>> a0[:,1,0].sum()
tensor(1.0000)
>>> a0[:,2,0].sum()
tensor(1.0000)
>>> a0[:,1,0].sum()
tensor(1.0000)
>>> a0[:,1,1].sum()
tensor(1.0000)
>>> a0[:,2,3].sum()
tensor(1.0000)

so as we expected if we sum all the elements along the first coordinate from the first value to the last value we get 1. So everything is normalized along the first dimension (or first coordiante i0).

所以正如我们预期的那样,如果我们将沿第一个坐标的所有元素从第一个值到最后一个值求和,我们将得到 1。所以一切都沿第一个维度(或第一个坐标i0)进行了归一化。

>>> torch.sum(a0,0)
tensor([[1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000]])


Also along the dimension 0 means that you vary the coordinate along that dimension and consider each element. Sort of like having a for loop going through the values the first coordinates can take i.e.

沿维度 0 也意味着您沿该维度改变坐标并考虑每个元素。有点像 for 循环遍历第一个坐标可以采用的值,即

for i0 in range(0,d0):
    a[i0,b,c,d]

回答by Steven

Let's consider the example in two dimensions

让我们考虑二维的例子

x = [[1,2],
    [3,4]]

do you want your final result to be

你希望你的最终结果是

y = [[0.27,0.73],
    [0.27,0.73]]

or

或者

y = [[0.12,0.12],
    [0.88,0.88]]

If it's the first option then you want dim = 1. If it's the second option you want dim = 0.

如果它是第一个选项,那么你想要dim = 1。如果它是第二个选项,你想要dim = 0。

Notice that the columns or zeroth dimension is normalized in the second example hence it is normalized along the zeroth dimension.

请注意,列或第零维度在第二个示例中被归一化,因此它沿第零维度被归一化。

Updated 2018-07-10: to reflect that zeroth dimension refers to columns in pytorch.

2018 年 7 月 10 日更新:反映第 0 个维度是指 pytorch 中的列。