python Python中优化的点积

Question

提问by Escualo

The dot product of two n-dimensional vectors u=[u1,u2,...un]and v=[v1,v2,...,vn]is is given by u1*v1 + u2*v2 + ... + un*vn.

两个n维向量的点积u=[u1,u2,...un]和v=[v1,v2,...,vn]被由下式给出u1*v1 + u2*v2 + ... + un*vn。

A question posted yesterdayencouraged me to find the fastest way to compute dot products in Python using only the standard library, no third-party modules or C/Fortran/C++ calls.

昨天发布的一个问题鼓励我找到在 Python 中仅使用标准库计算点积的最快方法，没有第三方模块或 C/Fortran/C++ 调用。

I timed four different approaches; so far the fastest seems to be sum(starmap(mul,izip(v1,v2)))(where starmapand izipcome from the itertoolsmodule).

我为四种不同的方法计时；到目前为止最快的似乎是sum(starmap(mul,izip(v1,v2)))（在那里starmap和izip来自itertools模块）。

For the code presented below, these are the elapsed times (in seconds, for one million runs):

对于下面显示的代码，这些是经过的时间（以秒为单位，运行一百万次）：

d0: 12.01215
d1: 11.76151
d2: 12.54092
d3: 09.58523

Can you think of a faster way to do this?

你能想出更快的方法来做到这一点吗？

import timeit # module with timing subroutines                                                              
import random # module to generate random numnbers                                                          
from itertools import imap,starmap,izip
from operator import mul

def v(N=50,min=-10,max=10):
    """Generates a random vector (in an array) of dimension N; the                                          
    values are integers in the range [min,max]."""
    out = []
    for k in range(N):
        out.append(random.randint(min,max))
    return out

def check(v1,v2):
    if len(v1)!=len(v2):
        raise ValueError,"the lenght of both arrays must be the same"
    pass

def d0(v1,v2):
    """                                                                                                     
    d0 is Nominal approach:                                                                                 
    multiply/add in a loop                                                                                  
    """
    check(v1,v2)
    out = 0
    for k in range(len(v1)):
        out += v1[k] * v2[k]
    return out

def d1(v1,v2):
    """                                                                                                     
    d1 uses an imap (from itertools)                                                                        
    """
    check(v1,v2)
    return sum(imap(mul,v1,v2))

def d2(v1,v2):
    """                                                                                                     
    d2 uses a conventional map                                                                              
    """
    check(v1,v2)
    return sum(map(mul,v1,v2))

def d3(v1,v2):
    """                                                                                                     
    d3 uses a starmap (itertools) to apply the mul operator on an izipped (v1,v2)                           
    """
    check(v1,v2)
    return sum(starmap(mul,izip(v1,v2)))

# generate the test vectors                                                                                 
v1 = v()
v2 = v()

if __name__ == '__main__':

    # Generate two test vectors of dimension N                                                              

    t0 = timeit.Timer("d0(v1,v2)","from dot_product import d0,v1,v2")
    t1 = timeit.Timer("d1(v1,v2)","from dot_product import d1,v1,v2")
    t2 = timeit.Timer("d2(v1,v2)","from dot_product import d2,v1,v2")
    t3 = timeit.Timer("d3(v1,v2)","from dot_product import d3,v1,v2")

    print "d0 elapsed: ", t0.timeit()
    print "d1 elapsed: ", t1.timeit()
    print "d2 elapsed: ", t2.timeit()
    print "d3 elapsed: ", t3.timeit()

Notice that the name of the file must be dot_product.pyfor the script to run; I used Python 2.5.1 on a Mac OS X Version 10.5.8.

请注意，文件名必须是dot_product.py要运行的脚本；我在 Mac OS X 版本 10.5.8 上使用了 Python 2.5.1。

EDIT:

编辑：

I ran the script for N=1000 and these are the results (in seconds, for one million runs):

我运行了 N=1000 的脚本，这些是结果（以秒为单位，运行一百万次）：

d0: 205.35457
d1: 208.13006
d2: 230.07463
d3: 155.29670

I guess it is safe to assume that, indeed, option three is the fastest and option two the slowest (of the four presented).

我想可以安全地假设，实际上，选项三是最快的，选项二是最慢的（在所呈现的四个中）。

Answer 1

采纳答案by Seth

Just for fun I wrote a "d4" which uses numpy:

只是为了好玩，我写了一个使用 numpy 的“d4”：

from numpy import dot
def d4(v1, v2): 
    check(v1, v2)
    return dot(v1, v2)

My results (Python 2.5.1, XP Pro sp3, 2GHz Core2 Duo T7200):

我的结果（Python 2.5.1、XP Pro sp3、2GHz Core2 Duo T7200）：

d0 elapsed:  12.1977242918
d1 elapsed:  13.885232341
d2 elapsed:  13.7929552499
d3 elapsed:  11.0952246724

~~d4 elapsed: 56.3278584289 # go numpy!~~

~~d4 elapsed: 56.3278584289 # 麻木了！~~

And, for even more fun, I turned on psyco:

而且，为了更有趣，我打开了 psyco：

d0 elapsed:  0.965477735299
d1 elapsed:  12.5354792299
d2 elapsed:  12.9748163524
d3 elapsed:  9.78255448667

~~d4 elapsed: 54.4599059378~~

~~d4 已过：54.4599059378~~

Based on that, I declare d0 the winner :)

基于此，我宣布 d0 获胜:)

Update

更新

@kaiser.se: I probably should have mentioned that I did convert everything to numpy arrays first:

@kaiser.se：我可能应该提到我确实首先将所有内容转换为 numpy 数组：

from numpy import array
v3 = [array(vec) for vec in v1]
v4 = [array(vec) for vec in v2]

# then
t4 = timeit.Timer("d4(v3,v4)","from dot_product import d4,v3,v4")

And I included check(v1, v2)since it's included in the other tests. Leaving it off would give numpy an unfair advantage (though it looks like it could use one). The array conversion shaved off about a second (much less than I thought it would).

我包括在内，check(v1, v2)因为它包含在其他测试中。离开它会给 numpy 一个不公平的优势（尽管它看起来可以使用一个）。数组转换减少了大约一秒钟（比我想象的要少得多）。

All of my tests were run with N=50.

我所有的测试都是在 N=50 的情况下运行的。

@nikow: I'm using numpy 1.0.4, which is undoubtedly old, it's certainly possible that they've improved performance over the last year and a half since I've installed it.

@nikow：我正在使用 numpy 1.0.4，这无疑是旧的，自从我安装它以来，它们肯定有可能在过去一年半中提高了性能。

Update #2更新 #2

@kaiser.se Wow, you are totally right. I must have been thinking that these were lists of lists or something (I really have no idea what I was thinking ... +1 for pair programming).

@kaiser.se 哇，你说得对。我一定一直认为这些是列表列表或其他东西（我真的不知道我在想什么......结对编程+1）。

How does this look:

这看起来如何：

v3 = array(v1)
v4 = array(v2)

New results:

新结果：

d4 elapsed:  3.22535741274

With Psyco:

使用心理：

d4 elapsed:  2.09182619579

d0 still wins with Psyco, but numpy is probably better overall, especially with larger data sets.

d0 仍然使用 Psyco 获胜，但 numpy 总体上可能更好，尤其是对于更大的数据集。

Yesterday I was a bit bothered my slow numpy result, since presumably numpy is used for a lotof computation and has had a lot of optimization. Obviously though, not bothered enough to check my result :)

昨天我有点困扰我缓慢的 numpy 结果，因为大概 numpy 用于大量计算并进行了大量优化。显然，虽然没有足够的精力检查我的结果:)

Answer 2

回答by u0b34a0f6ae

Here is a comparison with numpy. We compare the fast starmap approach with numpy.dot

这是与 numpy 的比较。我们将快速星图方法与numpy.dot

First, iteration over normal Python lists:

首先，对普通 Python 列表进行迭代：

$ python -mtimeit "import numpy as np; r = range(100)" "np.dot(r,r)"
10 loops, best of 3: 316 usec per loop

$ python -mtimeit "import operator; r = range(100); from itertools import izip, starmap" "sum(starmap(operator.mul, izip(r,r)))"
10000 loops, best of 3: 81.5 usec per loop

Then numpy ndarray:

然后是 numpy ndarray：

$ python -mtimeit "import numpy as np; r = np.arange(100)" "np.dot(r,r)"
10 loops, best of 3: 20.2 usec per loop

$ python -mtimeit "import operator; import numpy as np; r = np.arange(100); from itertools import izip, starmap;" "sum(starmap(operator.mul, izip(r,r)))"
10 loops, best of 3: 405 usec per loop

Seeing this, it seems numpy on numpy arrays is fastest, followed by python functional constructs working with lists.

看到这一点，似乎 numpy 数组上的 numpy 最快，其次是使用列表的 python 函数结构。

Answer 3

回答by SilentGhost

I don't know about faster, but I'd suggest:

我不知道更快，但我建议：

sum(i*j for i, j in zip(v1, v2))

it's much easier to read and doesn't require even standard-library modules.

它更容易阅读，甚至不需要标准库模块。

Answer 4

回答by tzot

Please benchmark this "d2a" function, and compare it to your "d3" function.

请对此“d2a”函数进行基准测试，并将其与您的“d3”函数进行比较。

from itertools import imap, starmap
from operator import mul

def d2a(v1,v2):
    """
    d2a uses itertools.imap
    """
    check(v1,v2)
    return sum(imap(mul, v1, v2))

map(on Python 2.x, which is what I assume you use) unnecessarily creates a dummy list prior to the computation.

map（在 Python 2.x 上，这是我假设您使用的）在计算之前不必要地创建了一个虚拟列表。

python Python中优化的点积

提问by Escualo

采纳答案by Seth

回答by u0b34a0f6ae

回答by SilentGhost

回答by tzot

相关推荐

最近更新

标签

python Python中优化的点积

提问by Escualo

采纳答案by Seth

回答by u0b34a0f6ae

回答by SilentGhost

回答by tzot

相关推荐

Python 装饰器处理文档字符串

python 如何获得小时：分钟

在 python 中逐项划分（划分 termino a termino en python ）

python 将 \r 文本转换为 \n 以便 readlines() 按预期工作

相关推荐

最近更新

标签