pandas 如何在熊猫中做两个数据帧的矩阵乘积?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34113608/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:20:08  来源:igfitidea点击:

How to do Matrix product of two Data Frames in Panda?

pythonpython-2.7numpypandasmatrix-multiplication

提问by Spandyie

I am very new to Python having recently migrated from Matlab. Is there a command in Python (Pandas or Numpy) that does Matlab like matrix multiplication of two dataframes created using Pandas?

我最近从 Matlab 迁移过来,对 Python 非常陌生。Python(Pandas 或 Numpy)中是否有一个命令可以让 Matlab 像使用 Pandas 创建的两个数据帧的矩阵乘法一样?

采纳答案by Alexander

Use dot:

使用dot

import numpy as np
import pandas as pd

np.random.seed(0)

# Numpy
m1 = np.random.randn(5, 5)
m2 = np.random.randn(5, 5)

>>> m1.dot(m2)
array([[ -5.51837355,  -4.08559942,  -1.88020209,   2.88961281,
          0.61755013],
       [  1.4732264 ,  -0.2394676 ,  -0.34717755,  -4.18527913,
         -1.75550855],
       [ -0.1871964 ,   0.76399007,  -0.26550057,  -3.43359244,
         -0.68081106],
       [ -0.23996774,   0.95331428,  -2.833788  ,  -0.37940614,
          0.05464387],
       [  3.73328914,  -0.59578959,   3.96803224, -10.65362381,
         -4.34460348]])

# Pandas
df1 = pd.DataFrame(m1)
df2 = pd.DataFrame(m2)

>>> df1.dot(df2)
          0         1         2          3         4
0 -5.518374 -4.085599 -1.880202   2.889613  0.617550
1  1.473226 -0.239468 -0.347178  -4.185279 -1.755509
2 -0.187196  0.763990 -0.265501  -3.433592 -0.680811
3 -0.239968  0.953314 -2.833788  -0.379406  0.054644
4  3.733289 -0.595790  3.968032 -10.653624 -4.344603

df3 = pd.DataFrame(np.random.randn(5, 3))
df4 = pd.DataFrame(np.random.randn(3, 5))

>>> df3.dot(df4)
          0         1         2         3         4
0  0.991673  1.954500  0.322110  0.493841  0.080462
1  0.160482  1.548039 -0.826426  0.972538 -0.048610
2  0.628194  0.482943  0.742597 -0.236226  0.089525
3 -0.098316  0.817702 -0.725945  1.271506 -0.309596
4 -1.053413  0.948427 -2.445940  2.814147 -0.726829

回答by Anton Protopopov

Alternatively to the well known dotfunction you could use numpy.matmulif you have numpy version >= 1.10.0:

除了众所周知的dot功能,如果您有 numpy 版本 >= ,您可以使用numpy.matmul1.10.0

import numpy as np
import pandas as pd

np.random.seed(632)
df1 = pd.DataFrame(np.random.randn(7, 7))
df2 = pd.DataFrame(np.random.randn(7, 7))

In [68]: np.matmul(df1, df2)
Out[68]: 
array([[ 0.08535756, -3.05102895,  3.26148284, -6.27736384, -1.52042691,
         2.40667207, -0.6385153 ],
       [ 5.29731049, -0.94033606, -0.12675555,  1.10453597, -1.70722837,
         2.57797682,  2.37629556],
       [ 0.31841755, -1.46897738, -0.22734008, -4.37852181, -0.98948844,
         3.49939092, -1.36656608],
       [ 0.90757446, -4.6364365 ,  1.86254589, -4.89078986,  0.31928714,
         2.3442364 , -2.29896007],
       [-1.14428758,  6.69735827, -3.8776982 ,  6.87574565,  1.38854952,
        -2.88767356,  1.46302112],
       [ 0.8771236 , -2.01941938,  1.03461007,  0.30331467,  2.39161032,
         0.07345672, -1.30557339],
       [ 0.94310211, -0.54294898,  2.46147932, -3.21588748, -2.98369364,
         3.73941015,  1.31782966]])

Performance almost the same:

性能几乎相同:

In [71]: %timeit np.dot(df1, df2)
10000 loops, best of 3: 63.7 μs per loop

In [73]: %timeit np.matmul(df1, df2)
10000 loops, best of 3: 64.2 μs per loop

But better then using df1.dot(df2):

但更好的是使用df1.dot(df2)

In [82]: %timeit df1.dot(df2)
1000 loops, best of 3: 217 μs per loop