numpy dot() 和 Python 3.5+ 矩阵乘法的区别@

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34142485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:31:07  来源:igfitidea点击:

Difference between numpy dot() and Python 3.5+ matrix multiplication @

pythonnumpymatrix-multiplicationpython-3.5

提问by blaz

I recently moved to Python 3.5 and noticed the new matrix multiplication operator (@)sometimes behaves differently from the numpy dotoperator. In example, for 3d arrays:

我最近转向 Python 3.5 并注意到新的矩阵乘法运算符 (@)有时与numpy 点运算符的行为不同。例如,对于 3d 数组:

import numpy as np

a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a @ b  # Python 3.5+
d = np.dot(a, b)

The @operator returns an array of shape:

@运算符返回一个形状数组:

c.shape
(8, 13, 13)

while the np.dot()function returns:

np.dot()函数返回时:

d.shape
(8, 13, 8, 13)

How can I reproduce the same result with numpy dot? Are there any other significant differences?

如何使用 numpy dot 重现相同的结果?还有其他显着差异吗?

采纳答案by Alex Riley

The @operator calls the array's __matmul__method, not dot. This method is also present in the API as the function np.matmul.

@运营商称阵列的__matmul__方法,而不是dot。该方法也作为函数出现在 API 中np.matmul

>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)

From the documentation:

从文档:

matmuldiffers from dotin two important ways.

  • Multiplication by scalars is not allowed.
  • Stacks of matrices are broadcast together as if the matrices were elements.

matmul区别于dot两个重要方面。

  • 不允许乘以标量。
  • 矩阵堆栈一起广播,就好像矩阵是元素一样。

The last point makes it clear that dotand matmulmethods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:

最后一点清楚地表明dotmatmul当传递 3D(或更高维)数组时,and方法的行为有所不同。从文档中引用更多:

For matmul:

对于matmul

If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

如果任一参数为 ND,N > 2,则将其视为驻留在最后两个索引中的矩阵堆栈并相应地广播。

For np.dot:

对于np.dot

For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b

对于二维数组,它相当于矩阵乘法,对于一维数组,相当于向量的内积(没有复共轭)。对于 N 维,它是 a 的最后一个轴和 b 的倒数第二个轴的和积

回答by Nathan

The answer by @ajcr explains how the dotand matmul(invoked by the @symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.

@ajcr 的回答解释了dotand matmul(由@符号调用)有何不同。通过查看一个简单的示例,可以清楚地看到两者在对“矩阵堆栈”或张量进行操作时的行为有何不同。

To clarify the differences take a 4x4 array and return the dotproduct and matmulproduct with a 3x4x2 'stack of matricies' or tensor.

为了澄清差异,采用 4x4 数组并返回具有 3x4x2“矩阵堆栈”或张量的dot乘积和matmul乘积。

import numpy as np
fourbyfour = np.array([
                       [1,2,3,4],
                       [3,2,1,4],
                       [5,4,6,7],
                       [11,12,13,14]
                      ])


threebyfourbytwo = np.array([
                             [[2,3],[11,9],[32,21],[28,17]],
                             [[2,3],[1,9],[3,21],[28,7]],
                             [[2,3],[1,9],[3,21],[28,7]],
                            ])

print('4x4*3x4x2 dot:\n {}\n'.format(np.dot(fourbyfour,twobyfourbythree)))
print('4x4*3x4x2 matmul:\n {}\n'.format(np.matmul(fourbyfour,twobyfourbythree)))

The products of each operation appear below. Notice how the dot product is,

每个操作的产品如下所示。注意点积是怎样的,

...a sum product over the last axis of a and the second-to-last of b

...a 的最后一个轴和 b 的倒数第二个轴上的和积

and how the matrix product is formed by broadcasting the matrix together.

以及如何通过一起广播矩阵来形成矩阵乘积。

4x4*3x4x2 dot:
 [[[232 152]
  [125 112]
  [125 112]]

 [[172 116]
  [123  76]
  [123  76]]

 [[442 296]
  [228 226]
  [228 226]]

 [[962 652]
  [465 512]
  [465 512]]]

4x4*3x4x2 matmul:
 [[[232 152]
  [172 116]
  [442 296]
  [962 652]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]]

回答by Yong Yang

In mathematics, I think the dotin numpy makes more sense

在数学中,我认为numpy 中的更有意义

dot(a,b)_{i,j,k,a,b,c} = formula

(a,b)_{i,j,k,a,b,c} =公式

since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices

因为当 a 和 b 是向量时它给出点积,或者当 a 和 b 是矩阵时给出矩阵乘法



As for matmuloperation in numpy, it consists of parts of dotresult, and it can be defined as

至于numpy 中的matmul操作,它由部分结果组成,可以定义为

>matmul(a,b)_{i,j,k,c} = formula

> matmul(a,b)_{i,j,k,c} =公式

So, you can see that matmul(a,b)returns an array with a small shape, which has smaller memory consumption and make more sense in applications. In particular, combining with broadcasting, you can get

所以,你可以看到matmul(a,b)返回一个小形状的数组,它具有更小的内存消耗并且在应用程序中更有意义。特别是,结合广播,你可以得到

matmul(a,b)_{i,j,k,l} = formula

matmul(a,b)_{i,j,k,l} =公式

for example.

例如。



From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4)and b.shape=(t1,t2,t3,t4)

从上面的两个定义,你可以看到使用这两个操作的要求。假设a.shape=(s1,s2,s3,s4)b.shape=(t1,t2,t3,t4)

  • To use dot(a,b)you need

    1. t3=s4;
  • To use matmul(a,b)you need

    1. t3=s4
    2. t2=s2, or one of t2 and s2 is 1
    3. t1=s1, or one of t1 and s1 is 1
  • 要使用dot(a,b)你需要

    1. t3=s4;
  • 要使用matmul(a,b)你需要

    1. t3=s4
    2. t2=s2,或 t2 和 s2 之一为 1
    3. t1=s1或 t1 和 s1 之一为 1


Use the following piece of code to convince yourself.

使用下面的一段代码来说服自己。

Code sample

代码示例

import numpy as np
for it in xrange(10000):
    a = np.random.rand(5,6,2,4)
    b = np.random.rand(6,4,3)
    c = np.matmul(a,b)
    d = np.dot(a,b)
    #print 'c shape: ', c.shape,'d shape:', d.shape

    for i in range(5):
        for j in range(6):
            for k in range(2):
                for l in range(3):
                    if not c[i,j,k,l] == d[i,j,k,j,l]:
                        print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them

回答by Nico Schl?mer

Just FYI, @and its numpy equivalents dotand matmulare all roughly equally fast. (Plot created with perfplot, a project of mine.)

仅供参考,@其numpy的等价物dot,并matmul都大致一样快。(用perfplot创建的,我的一个项目。)

enter image description here

在此处输入图片说明

Code to reproduce the plot:

重现情节的代码:

import perfplot
import numpy


def setup(n):
    A = numpy.random.rand(n, n)
    x = numpy.random.rand(n)
    return A, x


def at(data):
    A, x = data
    return A @ x


def numpy_dot(data):
    A, x = data
    return numpy.dot(A, x)


def numpy_matmul(data):
    A, x = data
    return numpy.matmul(A, x)


perfplot.show(
    setup=setup,
    kernels=[at, numpy_dot, numpy_matmul],
    n_range=[2 ** k for k in range(12)],
    logx=True,
    logy=True,
)

回答by Sambath Parthasarathy

My experience with MATMUL and DOT

我在 MATMUL 和 DOT 方面的经验

I was constantly getting "ValueError: Shape of passed values is (200, 1), indices imply (200, 3)" when trying to use MATMUL. I wanted a quick workaround and found DOT to deliver the same functionality. I don't get any error using DOT. I get the correct answer

尝试使用 MATMUL 时,我不断收到“ValueError:传递值的形状为 (200, 1),索引意味着 (200, 3)”。我想要一个快速的解决方法,并发现 DOT 可以提供相同的功能。使用 DOT 时我没有收到任何错误。我得到正确答案

with MATMUL

与 MATMUL

X.shape
>>>(200, 3)

type(X)

>>>pandas.core.frame.DataFrame

w

>>>array([0.37454012, 0.95071431, 0.73199394])

YY = np.matmul(X,w)

>>>  ValueError: Shape of passed values is (200, 1), indices imply (200, 3)"

with DOT

带点

YY = np.dot(X,w)
# no error message
YY
>>>array([ 2.59206877,  1.06842193,  2.18533396,  2.11366346,  0.28505879, …

YY.shape

>>> (200, )