Python 在 numpy 数组上映射函数的最有效方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35215161/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Most efficient way to map function over numpy array
提问by Ryan
What is the most efficient way to map a function over a numpy array? The way I've been doing it in my current project is as follows:
在 numpy 数组上映射函数的最有效方法是什么?我在当前项目中的做法如下:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
# Obtain array of square of each element in x
squarer = lambda t: t ** 2
squares = np.array([squarer(xi) for xi in x])
However, this seems like it is probably very inefficient, since I am using a list comprehension to construct the new array as a Python list before converting it back to a numpy array.
但是,这似乎效率很低,因为我使用列表理解将新数组构造为 Python 列表,然后再将其转换回 numpy 数组。
Can we do better?
我们能做得更好吗?
回答by bannana
回答by satomacoto
How about using numpy.vectorize
.
怎么用numpy.vectorize
。
import numpy as np
x = np.array([1, 2, 3, 4, 5])
squarer = lambda t: t ** 2
vfunc = np.vectorize(squarer)
vfunc(x)
# Output : array([ 1, 4, 9, 16, 25])
回答by user2357112 supports Monica
squares = squarer(x)
Arithmetic operations on arrays are automatically applied elementwise, with efficient C-level loops that avoid all the interpreter overhead that would apply to a Python-level loop or comprehension.
数组上的算术运算会自动按元素应用,具有高效的 C 级循环,避免了适用于 Python 级循环或理解的所有解释器开销。
Most of the functions you'd want to apply to a NumPy array elementwise will just work, though some may need changes. For example, if
doesn't work elementwise. You'd want to convert those to use constructs like numpy.where
:
您想要应用于 NumPy 数组元素的大多数函数都可以正常工作,尽管有些可能需要更改。例如,if
在元素上不起作用。您想将它们转换为使用如下结构numpy.where
:
def using_if(x):
if x < 5:
return x
else:
return x**2
becomes
变成
def using_where(x):
return numpy.where(x < 5, x, x**2)
回答by Mike T
TL;DR
TL; 博士
As noted by @user2357112, a "direct" method of applying the function is always the fastest and simplest way to map a function over Numpy arrays:
正如@user2357112所指出的,应用函数的“直接”方法始终是将函数映射到 Numpy 数组的最快和最简单的方法:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
f = lambda x: x ** 2
squares = f(x)
Generally avoid np.vectorize
, as it does not perform well, and has (or had) a number of issues. If you are handling other data types, you may want to investigate the other methods shown below.
一般避免np.vectorize
,因为它表现不佳,并且有(或有)许多问题。如果您正在处理其他数据类型,您可能需要研究下面显示的其他方法。
Comparison of methods
方法比较
Here are some simple tests to compare three methods to map a function, this example using with Python 3.6 and NumPy 1.15.4. First, the set-up functions for testing:
这里有一些简单的测试来比较映射函数的三种方法,这个例子使用 Python 3.6 和 NumPy 1.15.4。一、测试的设置功能:
import timeit
import numpy as np
f = lambda x: x ** 2
vf = np.vectorize(f)
def test_array(x, n):
t = timeit.timeit(
'np.array([f(xi) for xi in x])',
'from __main__ import np, x, f', number=n)
print('array: {0:.3f}'.format(t))
def test_fromiter(x, n):
t = timeit.timeit(
'np.fromiter((f(xi) for xi in x), x.dtype, count=len(x))',
'from __main__ import np, x, f', number=n)
print('fromiter: {0:.3f}'.format(t))
def test_direct(x, n):
t = timeit.timeit(
'f(x)',
'from __main__ import x, f', number=n)
print('direct: {0:.3f}'.format(t))
def test_vectorized(x, n):
t = timeit.timeit(
'vf(x)',
'from __main__ import x, vf', number=n)
print('vectorized: {0:.3f}'.format(t))
Testing with five elements (sorted from fastest to slowest):
使用五个元素进行测试(从最快到最慢排序):
x = np.array([1, 2, 3, 4, 5])
n = 100000
test_direct(x, n) # 0.265
test_fromiter(x, n) # 0.479
test_array(x, n) # 0.865
test_vectorized(x, n) # 2.906
With 100s of elements:
有 100 多个元素:
x = np.arange(100)
n = 10000
test_direct(x, n) # 0.030
test_array(x, n) # 0.501
test_vectorized(x, n) # 0.670
test_fromiter(x, n) # 0.883
And with 1000s of array elements or more:
并且具有 1000 个或更多的数组元素:
x = np.arange(1000)
n = 1000
test_direct(x, n) # 0.007
test_fromiter(x, n) # 0.479
test_array(x, n) # 0.516
test_vectorized(x, n) # 0.945
Different versions of Python/NumPy and compiler optimization will have different results, so do a similar test for your environment.
不同版本的 Python/NumPy 和编译器优化会有不同的结果,因此请针对您的环境进行类似的测试。
回答by Peiti Li
I believe in newer version( I use 1.13) of numpy you can simply call the function by passing the numpy array to the fuction that you wrote for scalar type, it will automatically apply the function call to each element over the numpy array and return you another numpy array
我相信在 numpy 的较新版本(我使用 1.13)中,您可以通过将 numpy 数组传递给您为标量类型编写的函数来简单地调用该函数,它会自动将函数调用应用于 numpy 数组上的每个元素并返回您另一个 numpy 数组
>>> import numpy as np
>>> squarer = lambda t: t ** 2
>>> x = np.array([1, 2, 3, 4, 5])
>>> squarer(x)
array([ 1, 4, 9, 16, 25])
回答by Nico Schl?mer
I've tested all suggested methods plus np.array(map(f, x))
with perfplot
(a small project of mine).
我测试过的所有建议的方法,加上np.array(map(f, x))
与perfplot
(我的一个小项目)。
Message #1: If you can use numpy's native functions, do that.
消息 #1:如果您可以使用 numpy 的本机函数,请执行此操作。
If the function you're trying to vectorize already isvectorized (like the x**2
example in the original post), using that is muchfaster than anything else (note the log scale):
如果你想已经矢量化功能的矢量(如x**2
在原岗位的例子),使用的是多比什么都更快(注意对数标度):
If you actually need vectorization, it doesn't really matter much which variant you use.
如果您确实需要矢量化,那么您使用哪种变体并不重要。
Code to reproduce the plots:
重现图的代码:
import numpy as np
import perfplot
import math
def f(x):
# return math.sqrt(x)
return np.sqrt(x)
vf = np.vectorize(f)
def array_for(x):
return np.array([f(xi) for xi in x])
def array_map(x):
return np.array(list(map(f, x)))
def fromiter(x):
return np.fromiter((f(xi) for xi in x), x.dtype)
def vectorize(x):
return np.vectorize(f)(x)
def vectorize_without_init(x):
return vf(x)
perfplot.show(
setup=lambda n: np.random.rand(n),
n_range=[2 ** k for k in range(20)],
kernels=[f, array_for, array_map, fromiter, vectorize, vectorize_without_init],
xlabel="len(x)",
)
回答by ead
There are numexpr, numbaand cythonaround, the goal of this answer is to take these possibilities into consideration.
周围有numexpr、numba和cython,这个答案的目标是考虑这些可能性。
But first let's state the obvious: no matter how you map a Python-function onto a numpy-array, it stays a Python function, that means for every evaluation:
但首先让我们声明一个显而易见的事情:无论你如何将 Python 函数映射到 numpy 数组,它仍然是一个 Python 函数,这意味着对于每个评估:
- numpy-array element must be converted to a Python-object (e.g. a
Float
). - all calculations are done with Python-objects, which means to have the overhead of interpreter, dynamic dispatch and immutable objects.
- numpy-array 元素必须转换为 Python 对象(例如 a
Float
)。 - 所有计算都是用 Python 对象完成的,这意味着有解释器、动态调度和不可变对象的开销。
So which machinery is used to actually loop through the array doesn't play a big role because of the overhead mentioned above - it stays much slower than using numpy's built-in functionality.
因此,由于上面提到的开销,使用哪种机制来实际循环数组并没有起到很大的作用 - 它比使用 numpy 的内置功能要慢得多。
Let's take a look at the following example:
让我们看一下下面的例子:
# numpy-functionality
def f(x):
return x+2*x*x+4*x*x*x
# python-function as ufunc
import numpy as np
vf=np.vectorize(f)
vf.__name__="vf"
np.vectorize
is picked as a representative of the pure-python function class of approaches. Using perfplot
(see code in the appendix of this answer) we get the following running times:
np.vectorize
被选为纯python函数类方法的代表。使用perfplot
(见本答案附录中的代码)我们得到以下运行时间:
We can see, that the numpy-approach is 10x-100x faster than the pure python version. The decrease of performance for bigger array-sizes is probably because data no longer fits the cache.
我们可以看到,numpy 方法比纯 python 版本快 10 到 100 倍。较大数组大小的性能下降可能是因为数据不再适合缓存。
It is worth also mentioning, that vectorize
also uses a lot of memory, so often memory-usage is the bottle-neck (see related SO-question). Also note, that numpy's documentation on np.vectorize
states that it is "provided primarily for convenience, not for performance".
还值得一提的是,这vectorize
也使用了大量内存,因此内存使用通常是瓶颈(请参阅相关的SO-question)。另请注意,numpy 的文档np.vectorize
说明它“主要是为了方便,而不是为了性能”。
Other tools should be used, when performance is desired, beside writing a C-extension from the scratch, there are following possibilities:
应该使用其他工具,当需要性能时,除了从头开始编写 C 扩展之外,还有以下可能性:
One often hears, that the numpy-performance is as good as it gets, because it is pure C under the hood. Yet there is a lot room for improvement!
人们经常听到,numpy-performance 已经够好了,因为它是纯 C 引擎。不过还有很大的提升空间!
The vectorized numpy-version uses a lot of additional memory and memory-accesses. Numexp-library tries to tile the numpy-arrays and thus get a better cache utilization:
矢量化的 numpy 版本使用了大量额外的内存和内存访问。Numexp-library 尝试平铺 numpy-arrays,从而获得更好的缓存利用率:
# less cache misses than numpy-functionality
import numexpr as ne
def ne_f(x):
return ne.evaluate("x+2*x*x+4*x*x*x")
Leads to the following comparison:
导致以下比较:
I cannot explain everything in the plot above: we can see bigger overhead for numexpr-library at the beginning, but because it utilize the cache better it is about 10 time faster for bigger arrays!
我无法解释上图中的所有内容:一开始我们可以看到 numexpr-library 的开销更大,但由于它更好地利用了缓存,因此对于更大的数组,速度大约快 10 倍!
Another approach is to jit-compile the function and thus getting a real pure-C UFunc. This is numba's approach:
另一种方法是对函数进行 jit 编译,从而获得真正的纯 C UFunc。这是 numba 的方法:
# runtime generated C-function as ufunc
import numba as nb
@nb.vectorize(target="cpu")
def nb_vf(x):
return x+2*x*x+4*x*x*x
It is 10 times faster than the original numpy-approach:
它比原始的 numpy 方法快 10 倍:
However, the task is embarrassingly parallelizable, thus we also could use prange
in order to calculate the loop in parallel:
但是,该任务是可并行化的,因此我们也可以使用prange
它来并行计算循环:
@nb.njit(parallel=True)
def nb_par_jitf(x):
y=np.empty(x.shape)
for i in nb.prange(len(x)):
y[i]=x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
return y
As expected, the parallel function is slower for smaller inputs, but faster (almost factor 2) for larger sizes:
正如预期的那样,并行函数对于较小的输入较慢,但对于较大的输入会更快(几乎是因子 2):
While numba specializes on optimizing operations with numpy-arrays, Cython is a more general tool. It is more complicated to extract the same performance as with numba - often it is down to llvm (numba) vs local compiler (gcc/MSVC):
虽然 numba 专注于使用 numpy-arrays 优化操作,但 Cython 是一个更通用的工具。提取与 numba 相同的性能更复杂 - 通常归结为 llvm (numba) 与本地编译器 (gcc/MSVC):
%%cython -c=/openmp -a
import numpy as np
import cython
#single core:
@cython.boundscheck(False)
@cython.wraparound(False)
def cy_f(double[::1] x):
y_out=np.empty(len(x))
cdef Py_ssize_t i
cdef double[::1] y=y_out
for i in range(len(x)):
y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
return y_out
#parallel:
from cython.parallel import prange
@cython.boundscheck(False)
@cython.wraparound(False)
def cy_par_f(double[::1] x):
y_out=np.empty(len(x))
cdef double[::1] y=y_out
cdef Py_ssize_t i
cdef Py_ssize_t n = len(x)
for i in prange(n, nogil=True):
y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
return y_out
Cython results in somewhat slower functions:
Cython 导致功能稍慢:
Conclusion
结论
Obviously, testing only for one function doesn't prove anything. Also one should keep in mind, that for the choosen function-example, the bandwidth of the memory was the bottle neck for sizes larger than 10^5 elements - thus we had the same performance for numba, numexpr and cython in this region.
显然,只测试一个函数并不能证明什么。还应该记住,对于所选的函数示例,内存带宽是大小大于 10^5 元素的瓶颈 - 因此我们在该区域中对 numba、numexpr 和 cython 具有相同的性能。
In the end, the ultimative answer depends on the type of function, hardware, Python-distribution and other factors. For example Anaconda-distribution uses Intel's VML for numpy's functions and thus outperforms numba (unless it uses SVML, see this SO-post) easily for transcendental functions like exp
, sin
, cos
and similar - see e.g. the following SO-post.
最后,最终答案取决于函数类型、硬件、Python 分布和其他因素。例如,Anaconda-distribution 使用 Intel 的 VML 来处理 numpy 的函数,因此对于像,和类似的超越函数来说,性能优于 numba(除非它使用 SVML,请参阅此SO-post)- 参见例如以下SO-post。exp
sin
cos
Yet from this investigation and from my experience so far, I would state, that numba seems to be the easiest tool with best performance as long as no transcendental functions are involved.
然而,根据这次调查和我迄今为止的经验,我想说,只要不涉及超越函数,numba 似乎是最简单的具有最佳性能的工具。
Plotting running times with perfplot-package:
使用perfplot -package绘制运行时间:
import perfplot
perfplot.show(
setup=lambda n: np.random.rand(n),
n_range=[2**k for k in range(0,24)],
kernels=[
f,
vf,
ne_f,
nb_vf, nb_par_jitf,
cy_f, cy_par_f,
],
logx=True,
logy=True,
xlabel='len(x)'
)
回答by Wunderbar
It seems no one has mentioned a built-in factory method of producing ufunc
in numpy package: np.frompyfunc
which I have tested again np.vectorize
and have outperformed it by about 20~30%. Of course it will perform well as prescribed C code or even numba
(which I have not tested), but it can a better alternative than np.vectorize
似乎没有人提到ufunc
在 numpy 包中生产的内置工厂方法:np.frompyfunc
我再次测试np.vectorize
并比它高出约 20~30%。当然,它会按照规定的 C 代码甚至numba
(我没有测试过)表现得很好,但它可以是比np.vectorize
f = lambda x, y: x * y
f_arr = np.frompyfunc(f, 2, 1)
vf = np.vectorize(f)
arr = np.linspace(0, 1, 10000)
%timeit f_arr(arr, arr) # 307ms
%timeit vf(arr, arr) # 450ms
I have also tested larger samples, and the improvement is proportional. See the documentation also here
我也测试了更大的样本,改进是成正比的。另请参阅此处的文档
回答by LyteFM
In many cases, numpy.apply_along_axiswill be the best choice. It increases the performance by about 100x compared to the other approaches - and not only for trivial test functions, but also for more complex function compositions from numpy and scipy.
在许多情况下,numpy.apply_along_axis将是最佳选择。与其他方法相比,它的性能提高了大约 100 倍——不仅对于琐碎的测试函数,而且对于来自 numpy 和 scipy 的更复杂的函数组合。
When I add the method:
当我添加方法时:
def along_axis(x):
return np.apply_along_axis(f, 0, x)
回答by Eric Cox
Use numpy.fromfunction(function, shape, **kwargs)
用 numpy.fromfunction(function, shape, **kwargs)
See "https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfunction.html"
请参阅“ https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfunction.html”