Python 什么是矢量化?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47755442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:21:22  来源:igfitidea点击:

What is vectorization?

pythonpython-3.xnumpyvectorization

提问by Jairus Patrick Vallon

What does it mean to vectorize for-loops in Python? Is there another way to write nested for-loops?

在 Python 中向量化 for 循环是什么意思?还有另一种编写嵌套for循环的方法吗?

I am new to Python and on my research, I always come across the NumPy library.

我是 Python 新手,在我的研究中,我总是遇到 NumPy 库。

回答by DeepSpace

Python forloops are inherently slower than their C counterpart.

Pythonfor循环本质上比它们的 C 循环慢。

This is why numpyoffers vectorized actions on numpyarrays. It pushes the forloop you would usually do in Python down to the C level, which is much faster. numpyoffers vectorized ("C level forloop") alternatives to things that otherwise would need to be done in an element-wise manner ("Python level forloop).

这就是为什么numpynumpy数组上提供矢量化操作的原因。它将for您通常在 Python 中执行的循环推到 C 级别,这要快得多。numpy提供矢量化(“C 级for循环”)替代方案,否则需要以元素方式(“Python 级for循环”)完成。

import numpy as np
from timeit import Timer

li = list(range(500000))
nump_arr = np.array(li)

def python_for():
    return [num + 1 for num in li]

def numpy_add():
    return nump_arr + 1

print(min(Timer(python_for).repeat(10, 10)))
print(min(Timer(numpy_add).repeat(10, 10)))

#  0.725692612368003
#  0.010465986942008954

The numpyvectorized addition was x70 times faster.

numpy矢量加快X70倍。

回答by Brad Solomon

Here's a definitionfrom Wes McKinney:

这是Wes McKinney的定义

Arrays are important because they enable you to express batch operations on data without writing any for loops. This is usually called vectorization. Any arithmetic operations between equal-size arrays applies the operation elementwise.

数组很重要,因为它们使您能够在不编写任何 for 循环的情况下对数据进行批处理。这通常称为矢量化。等长数组之间的任何算术运算都按元素应用运算。

Vectorized version:

矢量化版本:

>>> import numpy as np
>>> arr = np.array([[1., 2., 3.], [4., 5., 6.]])
>>> arr * arr
array([[  1.,   4.,   9.],
       [ 16.,  25.,  36.]])

The same thing with loops on a native Python (nested) list:

与本机 Python(嵌套)列表上的循环相同的事情:

>>> arr = arr.tolist()
>>> res = [[0., 0., 0.], [0., 0., 0.]]
>>> for idx1, row in enumerate(arr):
        for idx2, val2 in enumerate(row):
            res[idx1][idx2] = val2 * val2
>>> res
[[1.0, 4.0, 9.0], [16.0, 25.0, 36.0]]

How do these two operations compare? The NumPy version takes 436 ns; the Python version takes 3.52 μs (3520 ns). This large difference in "small" times is called microperformance, and it becomes important when you're working with larger data or repeating operations thousands or millions of times.

这两个操作如何比较?NumPy 版本需要 436 ns;Python 版本需要 3.52 μs (3520 ns)。这种“小”时间的巨大差异称为微性能,当您处理较大的数据或重复操作数千或数百万次时,这一点变得很重要。