Python 理解 NumPy 的 einsum

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26089893/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:03:43  来源:igfitidea点击:

Understanding NumPy's einsum

pythonarraysnumpymultidimensional-arraynumpy-einsum

提问by Lance Strait

I'm struggling to understand exactly how einsumworks. I've looked at the documentation and a few examples, but it's not seeming to stick.

我正在努力理解究竟是如何einsum工作的。我查看了文档和一些示例,但似乎并没有坚持。

Here's an example we went over in class:

这是我们在课堂上看过的一个例子:

C = np.einsum("ij,jk->ki", A, B)

for two arraysAand B

对于两个数组AB

I think this would take A^T * B, but I'm not sure (it's taking the transpose of one of them right?). Can anyone walk me through exactly what's happening here (and in general when using einsum)?

我认为这需要A^T * B,但我不确定(它正在对其中一个进行转置,对吗?)。任何人都可以带我了解这里发生的事情(通常在使用时einsum)?

采纳答案by Alex Riley

(Note: this answer is based on a short blog postabout einsumI wrote a while ago.)

(注:这个答案是基于短的博客文章einsum我写了前一阵子。)

What does einsumdo?

有什么作用einsum

Imagine that we have two multi-dimensional arrays, Aand B. Now let's suppose we want to...

想象一下,我们有两个多维数组,AB。现在让我们假设我们想要...

  • multiplyAwith Bin a particular way to create new array of products; and then maybe
  • sumthis new array along particular axes; and then maybe
  • transposethe axes of the new array in a particular order.
  • AB一种特殊的方式来创造新的产品阵列; 然后也许
  • 沿特定轴对这个新数组求和;然后也许
  • 按特定顺序转置新数组的轴。

There's a good chance that einsumwill help us do this faster and more memory-efficiently that combinations of the NumPy functions like multiply, sumand transposewill allow.

很有可能einsum会帮助我们更快、更高效地完成这项工作,这与 NumPy 函数的组合(如multiplysumtranspose将允许的情况相同。

How does einsumwork?

如何einsum工作?

Here's a simple (but not completely trivial) example. Take the following two arrays:

这是一个简单(但并非完全微不足道)的示例。取以下两个数组:

A = np.array([0, 1, 2])

B = np.array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11]])

We will multiply Aand Belement-wise and then sum along the rows of the new array. In "normal" NumPy we'd write:

我们将按元素相乘AB然后沿新数组的行求和。在“正常”的 NumPy 中,我们会这样写:

>>> (A[:, np.newaxis] * B).sum(axis=1)
array([ 0, 22, 76])

So here, the indexing operation on Alines up the first axes of the two arrays so that the multiplication can be broadcast. The rows of the array of products is then summed to return the answer.

所以在这里,索引操作A将两个数组的第一个轴对齐,以便可以广播乘法。然后将产品数组的行相加以返回答案。

Now if we wanted to use einsuminstead, we could write:

现在,如果我们想einsum改用,我们可以这样写:

>>> np.einsum('i,ij->i', A, B)
array([ 0, 22, 76])

The signaturestring 'i,ij->i'is the key here and needs a little bit of explaining. You can think of it in two halves. On the left-hand side (left of the ->) we've labelled the two input arrays. To the right of ->, we've labelled the array we want to end up with.

签名字符串'i,ij->i'是这里的关键,需要解释的一点。你可以把它想成两半。在左侧( 的左侧->),我们标记了两个输入数组。在 的右侧->,我们标记了我们想要结束的数组。

Here is what happens next:

下面是接下来发生的事情:

  • Ahas one axis; we've labelled it i. And Bhas two axes; we've labelled axis 0 as iand axis 1 as j.

  • By repeatingthe label iin both input arrays, we are telling einsumthat these two axes should be multipliedtogether. In other words, we're multiplying array Awith each column of array B, just like A[:, np.newaxis] * Bdoes.

  • Notice that jdoes not appear as a label in our desired output; we've just used i(we want to end up with a 1D array). By omittingthe label, we're telling einsumto sumalong this axis. In other words, we're summing the rows of the products, just like .sum(axis=1)does.

  • A有一个轴;我们已经给它贴上了标签i。并且B有两个轴;我们将轴 0 标记为i,轴 1标记为j

  • 通过在两个输入数组中重复标签i,我们告诉我们einsum应该将这两个轴相乘。换句话说,我们将 arrayA与array的每一列相乘B,就像A[:, np.newaxis] * B做的那样。

  • 请注意,j在我们想要的输出中没有作为标签出现;我们刚刚使用过i(我们希望以一维数组结束)。通过省略标签,我们告诉einsum总结沿着这条轴线。换句话说,我们正在对产品的行求和,就像.sum(axis=1)做的那样。

That's basically all you need to know to use einsum. It helps to play about a little; if we leave both labels in the output, 'i,ij->ij', we get back a 2D array of products (same as A[:, np.newaxis] * B). If we say no output labels, 'i,ij->, we get back a single number (same as doing (A[:, np.newaxis] * B).sum()).

这基本上就是您使用einsum. 稍微玩一下会有所帮助;如果我们在输出中保留两个标签,'i,ij->ij',我们将返回一个二维产品数组(与 相同A[:, np.newaxis] * B)。如果我们说没有输出标签,'i,ij->,我们会得到一个数字(与做 相同(A[:, np.newaxis] * B).sum())。

The great thing about einsumhowever, is that is does not build a temporary array of products first; it just sums the products as it goes. This can lead to big savings in memory use.

einsum然而,伟大的事情是它不会首先构建一个临时的产品数组;它只是对产品进行汇总。这可以大大节省内存使用。

A slightly bigger example

一个稍微大一点的例子

To explain the dot product, here are two new arrays:

为了解释点积,这里有两个新数组:

A = array([[1, 1, 1],
           [2, 2, 2],
           [5, 5, 5]])

B = array([[0, 1, 0],
           [1, 1, 0],
           [1, 1, 1]])

We will compute the dot product using np.einsum('ij,jk->ik', A, B). Here's a picture showing the labelling of the Aand Band the output array that we get from the function:

我们将使用 计算点积np.einsum('ij,jk->ik', A, B)。这是一张图片,显示了我们从函数中获得的AandB和输出数组的标签:

enter image description here

在此处输入图片说明

You can see that label jis repeated - this means we're multiplying the rows of Awith the columns of B. Furthermore, the label jis not included in the output - we're summing these products. Labels iand kare kept for the output, so we get back a 2D array.

你可以看到标签j是重复的——这意味着我们将 的行A与 的列相乘B。此外,j输出中不包含标签- 我们正在对这些产品进行求和。标签ik保留用于输出,因此我们返回一个二维数组。

It might be even clearer to compare this result with the array where the label jis notsummed. Below, on the left you can see the 3D array that results from writing np.einsum('ij,jk->ijk', A, B)(i.e. we've kept label j):

这一结果与其中标签阵列比较可能是更加明显j求和。下面,在左侧,您可以看到写入结果的 3D 数组np.einsum('ij,jk->ijk', A, B)(即我们保留了 label j):

enter image description here

在此处输入图片说明

Summing axis jgives the expected dot product, shown on the right.

求和轴j给出了预期的点积,如右图所示。

Some exercises

一些练习

To get more of feel for einsum, it can be useful to implement familiar NumPy array operations using the subscript notation. Anything that involves combinations of multiplying and summing axes can be written using einsum.

为了更好地了解 ,einsum使用下标符号实现熟悉的 NumPy 数组操作会很有用。任何涉及乘法轴和求和轴组合的东西都可以使用 einsum.

Let A and B be two 1D arrays with the same length. For example, A = np.arange(10)and B = np.arange(5, 15).

设 A 和 B 是两个长度相同的一维数组。例如,A = np.arange(10)B = np.arange(5, 15)

  • The sum of Acan be written:

    np.einsum('i->', A)
    
  • Element-wise multiplication, A * B, can be written:

    np.einsum('i,i->i', A, B)
    
  • The inner product or dot product, np.inner(A, B)or np.dot(A, B), can be written:

    np.einsum('i,i->', A, B) # or just use 'i,i'
    
  • The outer product, np.outer(A, B), can be written:

    np.einsum('i,j->ij', A, B)
    
  • 的总和A可以写成:

    np.einsum('i->', A)
    
  • 元素乘法, A * B, 可以写成:

    np.einsum('i,i->i', A, B)
    
  • 内积或点积np.inner(A, B)ornp.dot(A, B)可以写成:

    np.einsum('i,i->', A, B) # or just use 'i,i'
    
  • 外积np.outer(A, B)可以写成:

    np.einsum('i,j->ij', A, B)
    

For 2D arrays, Cand D, provided that the axes are compatible lengths (both the same length or one of them of has length 1), here are a few examples:

对于 2D 数组CD,假设轴长度兼容(长度相同或其中之一的长度为 1),以下是一些示例:

  • The trace of C(sum of main diagonal), np.trace(C), can be written:

    np.einsum('ii', C)
    
  • Element-wise multiplication of Cand the transpose of D, C * D.T, can be written:

    np.einsum('ij,ji->ij', C, D)
    
  • Multiplying each element of Cby the array D(to make a 4D array), C[:, :, None, None] * D, can be written:

    np.einsum('ij,kl->ijkl', C, D)  
    
  • C(主对角线之和), np.trace(C),的迹可以写成:

    np.einsum('ii', C)
    
  • 的元素方式乘法C和转置DC * D.T可以写成:

    np.einsum('ij,ji->ij', C, D)
    
  • 将 的每个元素乘以C数组D(以创建 4D 数组), C[:, :, None, None] * D, 可以写成:

    np.einsum('ij,kl->ijkl', C, D)  
    

回答by wwii

I found NumPy: The tricks of the trade (Part II)instructive

我发现NumPy:交易技巧(第二部分)很有启发性

We use -> to indicate the order of the output array. So think of 'ij, i->j' as having left hand side (LHS) and right hand side (RHS). Any repetition of labels on the LHS computes the product element wise and then sums over. By changing the label on the RHS (output) side, we can define the axis in which we want to proceed with respect to the input array, i.e. summation along axis 0, 1 and so on.

我们使用 -> 来表示输出数组的顺序。因此,将 'ij, i->j' 视为具有左侧 (LHS) 和右侧 (RHS)。LHS 上标签的任何重复都会明智地计算乘积元素,然后求和。通过更改 RHS(输出)侧的标签,我们可以定义我们想要相对于输入数组进行的轴,即沿轴 0、1 等的求和。

import numpy as np

>>> a
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])
>>> b
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> d = np.einsum('ij, jk->ki', a, b)

Notice there are three axes, i, j, k, and that j is repeated (on the left-hand-side). i,jrepresent rows and columns for a. j,kfor b.

请注意,有三个轴 i、j、k,并且 j 是重复的(在左侧)。 i,j表示 的行和列aj,kb

In order to calculate the product and align the jaxis we need to add an axis to a. (bwill be broadcast along(?) the first axis)

为了计算乘积并对齐j轴,我们需要向 中添加一个轴a。(b将沿(?)第一个轴广播)

a[i, j, k]
   b[j, k]

>>> c = a[:,:,np.newaxis] * b
>>> c
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 0,  2,  4],
        [ 6,  8, 10],
        [12, 14, 16]],

       [[ 0,  3,  6],
        [ 9, 12, 15],
        [18, 21, 24]]])

jis absent from the right-hand-side so we sum over jwhich is the second axis of the 3x3x3 array

j右侧不存在,所以我们求和j哪个是 3x3x3 数组的第二个轴

>>> c = c.sum(1)
>>> c
array([[ 9, 12, 15],
       [18, 24, 30],
       [27, 36, 45]])

Finally, the indices are (alphabetically) reversed on the right-hand-side so we transpose.

最后,索引在右侧(按字母顺序)反转,因此我们转置。

>>> c.T
array([[ 9, 18, 27],
       [12, 24, 36],
       [15, 30, 45]])

>>> np.einsum('ij, jk->ki', a, b)
array([[ 9, 18, 27],
       [12, 24, 36],
       [15, 30, 45]])
>>>

回答by hpaulj

Lets make 2 arrays, with different, but compatible dimensions to highlight their interplay

让我们制作 2 个具有不同但兼容维度的数组以突出它们的相互作用

In [43]: A=np.arange(6).reshape(2,3)
Out[43]: 
array([[0, 1, 2],
       [3, 4, 5]])


In [44]: B=np.arange(12).reshape(3,4)
Out[44]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Your calculation, takes a 'dot' (sum of products) of a (2,3) with a (3,4) to produce a (4,2) array. iis the 1st dim of A, the last of C; kthe last of B, 1st of C. jis 'consumed' by the summation.

您的计算采用 (2,3) 与 (3,4) 的“点”(乘积之和)来生成 (4,2) 数组。 i是 的第一个昏暗A,最后一个Ck最后一个B,第一个Cj被求和“消耗”。

In [45]: C=np.einsum('ij,jk->ki',A,B)
Out[45]: 
array([[20, 56],
       [23, 68],
       [26, 80],
       [29, 92]])

This is the same as np.dot(A,B).T- it's the final output that's transposed.

这与np.dot(A,B).T- 它是转置的最终输出相同。

To see more of what happens to j, change the Csubscripts to ijk:

要查看更多发生的事情j,请将C下标更改为ijk

In [46]: np.einsum('ij,jk->ijk',A,B)
Out[46]: 
array([[[ 0,  0,  0,  0],
        [ 4,  5,  6,  7],
        [16, 18, 20, 22]],

       [[ 0,  3,  6,  9],
        [16, 20, 24, 28],
        [40, 45, 50, 55]]])

This can also be produced with:

这也可以使用:

A[:,:,None]*B[None,:,:]

That is, add a kdimension to the end of A, and an ito the front of B, resulting in a (2,3,4) array.

即,添加一个k维度的端部A,以及i与前部B,产生了(2,3,4)阵列。

0 + 4 + 16 = 20, 9 + 28 + 55 = 92, etc; Sum on jand transpose to get the earlier result:

0 + 4 + 16 = 20, 9 + 28 + 55 = 92, 等;求和j并转置以获得较早的结果:

np.sum(A[:,:,None] * B[None,:,:], axis=1).T

# C[k,i] = sum(j) A[i,j (,k) ] * B[(i,)  j,k]

回答by kmario23

Grasping the idea of numpy.einsum()is very easy if you understand it intuitively. As an example, let's start with a simple description involving matrix multiplication.

numpy.einsum()如果你直观地理解它,那么掌握它的想法是很容易的。作为一个例子,让我们从一个涉及矩阵乘法的简单描述开始。



To use numpy.einsum(), all you have to do is to pass the so-called subscripts stringas an argument, followed by your input arrays.

要使用numpy.einsum(),您所要做的就是将所谓的下标字符串作为参数传递,然后是您的输入数组

Let's say you have two 2D arrays, Aand B, and you want to do matrix multiplication. So, you do:

假设您有两个二维数组AB,并且您想要进行矩阵乘法。所以你也是:

np.einsum("ij, jk -> ik", A, B)

Here the subscript stringijcorresponds to array Awhile the subscript stringjkcorresponds to array B. Also, the most important thing to note here is that the number of charactersin each subscript stringmustmatch the dimensions of the array. (i.e. two chars for 2D arrays, three chars for 3D arrays, and so on.) And if you repeat the chars between subscript strings(jin our case), then that means you want the einsumto happen along those dimensions. Thus, they will be sum-reduced. (i.e. that dimension will be gone)

这里下标字符串ij对应数组A下标字符串jk对应数组B。另外,这里需要注意的最重要的一点是每个下标字符串中的字符数必须与数组的维度匹配。(即 2D 数组的两个字符,3D 数组的三个字符,等等。)如果您在下标字符串之间重复字符(在我们的例子中),那么这意味着您希望总和沿这些维度发生。因此,它们将被求和。(即那个维度将消失jein

The subscript stringafter this ->, will be our resultant array. If you leave it empty, then everything will be summed and a scalar value is returned as result. Else the resultant array will have dimensions according to the subscript string. In our example, it'll be ik. This is intuitive because we know that for matrix multiplication the number of columns in array Ahas to match the number of rows in array Bwhich is what is happening here (i.e. we encode this knowledge by repeating the char jin the subscript string)

this 之后的下标字符串->将是我们的结果数组。如果将其留空,则所有内容都将被求和并返回一个标量值作为结果。否则,结果数组将具有根据下标 string 的维度。在我们的示例中,它将是ik. 这是直观的,因为我们知道对于矩阵乘法,数组中的列数A必须与数组中的行数相匹配,B这就是这里发生的情况(即我们通过重复下标字符串中的字符j来编码此知识)



Here are some more examples illustrating the use/power of np.einsum()in implementing some common tensoror nd-arrayoperations, succinctly.

这里有一些更多的例子,说明了np.einsum()在实现一些常见的张量nd-array操作时的使用/威力,简洁明了。

Inputs

输入

# a vector
In [197]: vec
Out[197]: array([0, 1, 2, 3])

# an array
In [198]: A
Out[198]: 
array([[11, 12, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 33, 34],
       [41, 42, 43, 44]])

# another array
In [199]: B
Out[199]: 
array([[1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3],
       [4, 4, 4, 4]])

1) Matrix multiplication(similar to np.matmul(arr1, arr2))

1) 矩阵乘法(类似于np.matmul(arr1, arr2)

In [200]: np.einsum("ij, jk -> ik", A, B)
Out[200]: 
array([[130, 130, 130, 130],
       [230, 230, 230, 230],
       [330, 330, 330, 330],
       [430, 430, 430, 430]])

2) Extract elements along the main-diagonal(similar to np.diag(arr))

2) 沿主对角线提取元素(类似于np.diag(arr)

In [202]: np.einsum("ii -> i", A)
Out[202]: array([11, 22, 33, 44])

3) Hadamard product (i.e. element-wise product of two arrays)(similar to arr1 * arr2)

3) Hadamard 乘积(即两个数组的逐元素乘积)(类似于arr1 * arr2

In [203]: np.einsum("ij, ij -> ij", A, B)
Out[203]: 
array([[ 11,  12,  13,  14],
       [ 42,  44,  46,  48],
       [ 93,  96,  99, 102],
       [164, 168, 172, 176]])

4) Element-wise squaring(similar to np.square(arr)or arr ** 2)

4) 逐元素平方(类似于np.square(arr)arr ** 2

In [210]: np.einsum("ij, ij -> ij", B, B)
Out[210]: 
array([[ 1,  1,  1,  1],
       [ 4,  4,  4,  4],
       [ 9,  9,  9,  9],
       [16, 16, 16, 16]])

5) Trace (i.e. sum of main-diagonal elements)(similar to np.trace(arr))

5)迹(即主对角线元素的总和)(类似于np.trace(arr)

In [217]: np.einsum("ii -> ", A)
Out[217]: 110

6) Matrix transpose(similar to np.transpose(arr))

6)矩阵转置(类似np.transpose(arr)

In [221]: np.einsum("ij -> ji", A)
Out[221]: 
array([[11, 21, 31, 41],
       [12, 22, 32, 42],
       [13, 23, 33, 43],
       [14, 24, 34, 44]])

7) Outer Product (of vectors)(similar to np.outer(vec1, vec2))

7) 外积(向量的)(类似于np.outer(vec1, vec2)

In [255]: np.einsum("i, j -> ij", vec, vec)
Out[255]: 
array([[0, 0, 0, 0],
       [0, 1, 2, 3],
       [0, 2, 4, 6],
       [0, 3, 6, 9]])

8) Inner Product (of vectors)(similar to np.inner(vec1, vec2))

8) 内积(向量的)(类似于np.inner(vec1, vec2)

In [256]: np.einsum("i, i -> ", vec, vec)
Out[256]: 14

9) Sum along axis 0(similar to np.sum(arr, axis=0))

9) 沿轴 0 求和(类似于np.sum(arr, axis=0)

In [260]: np.einsum("ij -> j", B)
Out[260]: array([10, 10, 10, 10])

10) Sum along axis 1(similar to np.sum(arr, axis=1))

10) 沿轴 1 求和(类似于np.sum(arr, axis=1)

In [261]: np.einsum("ij -> i", B)
Out[261]: array([ 4,  8, 12, 16])

11) Batch Matrix Multiplication

11) 批量矩阵乘法

In [287]: BM = np.stack((A, B), axis=0)

In [288]: BM
Out[288]: 
array([[[11, 12, 13, 14],
        [21, 22, 23, 24],
        [31, 32, 33, 34],
        [41, 42, 43, 44]],

       [[ 1,  1,  1,  1],
        [ 2,  2,  2,  2],
        [ 3,  3,  3,  3],
        [ 4,  4,  4,  4]]])

In [289]: BM.shape
Out[289]: (2, 4, 4)

# batch matrix multiply using einsum
In [292]: BMM = np.einsum("bij, bjk -> bik", BM, BM)

In [293]: BMM
Out[293]: 
array([[[1350, 1400, 1450, 1500],
        [2390, 2480, 2570, 2660],
        [3430, 3560, 3690, 3820],
        [4470, 4640, 4810, 4980]],

       [[  10,   10,   10,   10],
        [  20,   20,   20,   20],
        [  30,   30,   30,   30],
        [  40,   40,   40,   40]]])

In [294]: BMM.shape
Out[294]: (2, 4, 4)


12) Sum along axis 2(similar to np.sum(arr, axis=2))

12) 沿轴 2 求和(类似于np.sum(arr, axis=2)

In [330]: np.einsum("ijk -> ij", BM)
Out[330]: 
array([[ 50,  90, 130, 170],
       [  4,   8,  12,  16]])

13) Sum all the elements in array(similar to np.sum(arr))

13) 对数组中的所有元素求和(类似于np.sum(arr)

In [335]: np.einsum("ijk -> ", BM)
Out[335]: 480

14) Sum over multiple axes (i.e. marginalization)
(similar to np.sum(arr, axis=(axis0, axis1, axis2, axis3, axis4, axis6, axis7)))

14) 在多个轴上求和(即边缘化)
(类似于np.sum(arr, axis=(axis0, axis1, axis2, axis3, axis4, axis6, axis7))

# 8D array
In [354]: R = np.random.standard_normal((3,5,4,6,8,2,7,9))

# marginalize out axis 5 (i.e. "n" here)
In [363]: esum = np.einsum("ijklmnop -> n", R)

# marginalize out axis 5 (i.e. sum over rest of the axes)
In [364]: nsum = np.sum(R, axis=(0,1,2,3,4,6,7))

In [365]: np.allclose(esum, nsum)
Out[365]: True

15) Double Dot Products(similar to np.sum(hadamard-product)cf. 3)

15)双点(类似于np.sum(hadamard-product)cf. 3

In [772]: A
Out[772]: 
array([[1, 2, 3],
       [4, 2, 2],
       [2, 3, 4]])

In [773]: B
Out[773]: 
array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [774]: np.einsum("ij, ij -> ", A, B)
Out[774]: 124

16) 2D and 3D array multiplication

16) 2D 和 3D 数组乘法

Such a multiplication could be very useful when solving linear system of equations (Ax = b) where you want to verify the result.

在求解线性方程组 ( Ax = b) 以验证结果时,这种乘法可能非常有用。

# inputs
In [115]: A = np.random.rand(3,3)
In [116]: b = np.random.rand(3, 4, 5)

# solve for x
In [117]: x = np.linalg.solve(A, b.reshape(b.shape[0], -1)).reshape(b.shape)

# 2D and 3D array multiplication :)
In [118]: Ax = np.einsum('ij, jkl', A, x)

# indeed the same!
In [119]: np.allclose(Ax, b)
Out[119]: True

On the contrary, if one has to use np.matmul()for this verification, we have to do couple of reshapeoperations to achieve the same result like:

相反,如果必须使用np.matmul()此验证,我们必须执行几个reshape操作才能获得相同的结果,例如:

# reshape 3D array `x` to 2D, perform matmul
# then reshape the resultant array to 3D
In [123]: Ax_matmul = np.matmul(A, x.reshape(x.shape[0], -1)).reshape(x.shape)

# indeed correct!
In [124]: np.allclose(Ax, Ax_matmul)
Out[124]: True

Bonus: Read more math here : Einstein-Summationand definitely here: Tensor-Notation

奖励:在此处阅读更多数学:Einstein-Summation和绝对在这里:Tensor-Notation

回答by Stefan Dragnev

When reading einsum equations, I've found it the most helpful to just be able to mentally boil them down to their imperative versions.

在阅读 einsum 方程时,我发现能够在精神上将它们归结为它们的命令式版本是最有帮助的。

Let's start with the following (imposing) statement:

让我们从以下(强加的)声明开始:

C = np.einsum('bhwi,bhwj->bij', A, B)

Working through the punctuation first we see that we have two 4-letter comma-separated blobs - bhwiand bhwj, before the arrow, and a single 3-letter blob bijafter it. Therefore, the equation produces a rank-3 tensor result from two rank-4 tensor inputs.

首先检查标点符号,我们看到我们有两个 4 字母逗号分隔的 blob -bhwibhwj,在箭头之前,bij在它之后有一个 3 个字母的 blob 。因此,该方程从两个 rank-4 张量输入生成一个 rank-3 张量结果。

Now, let each letter in each blob be the name of a range variable. The position at which the letter appears in the blob is the index of the axis that it ranges over in that tensor. The imperative summation that produces each element of C, therefore, has to start with three nested for loops, one for each index of C.

现在,让每个 blob 中的每个字母都是范围变量的名称。字母在 blob 中出现的位置是它在该张量中范围的轴的索引。因此,产生 C 的每个元素的命令式求和必须从三个嵌套的 for 循环开始,每个循环一个用于 C 的索引。

for b in range(...):
    for i in range(...):
        for j in range(...):
            # the variables b, i and j index C in the order of their appearance in the equation
            C[b, i, j] = ...

So, essentially, you have a forloop for every output index of C. We'll leave the ranges undetermined for now.

因此,本质上,您for对 C 的每个输出索引都有一个循环。我们暂时不确定范围。

Next we look at the left-hand side - are there any range variables there that don'tappear on the right-handside? In our case - yes, hand w. Add an inner nested forloop for every such variable:

接下来我们看看左侧 - 是否有没有出现在右侧的范围变量?在我们的例子中 - 是的,h并且wfor为每个这样的变量添加一个内部嵌套循环:

for b in range(...):
    for i in range(...):
        for j in range(...):
            C[b, i, j] = 0
            for h in range(...):
                for w in range(...):
                    ...

Inside the innermost loop we now have all indices defined, so we can write the actual summation and the translation is complete:

在最里面的循环中,我们现在已经定义了所有索引,因此我们可以编写实际求和并完成翻译:

# three nested for-loops that index the elements of C
for b in range(...):
    for i in range(...):
        for j in range(...):

            # prepare to sum
            C[b, i, j] = 0

            # two nested for-loops for the two indexes that don't appear on the right-hand side
            for h in range(...):
                for w in range(...):
                    # Sum! Compare the statement below with the original einsum formula
                    # 'bhwi,bhwj->bij'

                    C[b, i, j] += A[b, h, w, i] * B[b, h, w, j]

If you've been able to follow the code thus far, then congratulations! This is all you need to be able to read einsum equations. Notice in particular how the original einsum formula maps to the final summation statement in the snippet above. The for-loops and range bounds are just fluff and that final statement is all you really need to understand what's going on.

如果到目前为止您已经能够遵循代码,那么恭喜您!这就是阅读 einsum 方程所需的全部内容。请特别注意原始 einsum 公式如何映射到上述代码段中的最终求和语句。for 循环和范围边界只是浮云,最后的语句是您真正需要了解正在发生的事情的全部内容。

For the sake of completeness, let's see how to determine the ranges for each range variable. Well, the range of each variable is simply the length of the dimension(s) which it indexes. Obviously, if a variable indexes more than one dimension in one or more tensors, then the lengths of each of those dimensions have to be equal. Here's the code above with the complete ranges:

为了完整起见,让我们看看如何确定每个范围变量的范围。嗯,每个变量的范围只是它索引的维度的长度。显然,如果一个变量在一个或多个张量中索引多个维度,那么每个维度的长度必须相等。这是上面包含完整范围的代码:

# C's shape is determined by the shapes of the inputs
# b indexes both A and B, so its range can come from either A.shape or B.shape
# i indexes only A, so its range can only come from A.shape, the same is true for j and B
assert A.shape[0] == B.shape[0]
assert A.shape[1] == B.shape[1]
assert A.shape[2] == B.shape[2]
C = np.zeros((A.shape[0], A.shape[3], B.shape[3]))
for b in range(A.shape[0]): # b indexes both A and B, or B.shape[0], which must be the same
    for i in range(A.shape[3]):
        for j in range(B.shape[3]):
            # h and w can come from either A or B
            for h in range(A.shape[1]):
                for w in range(A.shape[2]):
                    C[b, i, j] += A[b, h, w, i] * B[b, h, w, j]