Python What is the preferred way to preallocate NumPy arrays?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3491802/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:24:41  来源:igfitidea点击:

What is the preferred way to preallocate NumPy arrays?

pythonnumpy

提问by kim busyn

I am new to NumPy/SciPy. From the documentation, it seems more efficient to preallocate a single array rather than call append/insert/concatenate.

I am new to NumPy/SciPy. From the documentation, it seems more efficient to preallocate a single array rather than call append/insert/concatenate.

For example, to add a column of 1's to an array, i think that this:

For example, to add a column of 1's to an array, i think that this:

ar0 = np.linspace(10, 20, 16).reshape(4, 4)
ar0[:,-1] = np.ones_like(ar0[:,0])

is preferred to this:

is preferred to this:

ar0 = np.linspace(10, 20, 12).reshape(4, 3)
ar0 = np.insert(ar0, ar0.shape[1], np.ones_like(ar0[:,0]), axis=1)

my first question is whether this is correct (that the first is better), and my second question is, at the moment, I am just preallocating my arrays like this (which I noticed in several of the Cookbook examples on the SciPy Site):

my first question is whether this is correct (that the first is better), and my second question is, at the moment, I am just preallocating my arrays like this (which I noticed in several of the Cookbook examples on the SciPy Site):

np.zeros((8,5))

what is the 'NumPy-preferred' way to do this?

what is the 'NumPy-preferred' way to do this?

采纳答案by unutbu

Preallocation mallocs all the memory you need in one call, while resizing the array (through calls to append,insert,concatenate or resize) may require copying the array to a larger block of memory. So you are correct, preallocation is preferred over (and should be faster than) resizing.

Preallocation mallocs all the memory you need in one call, while resizing the array (through calls to append,insert,concatenate or resize) may require copying the array to a larger block of memory. So you are correct, preallocation is preferred over (and should be faster than) resizing.

There are a number of "preferred" ways to preallocate numpy arrays depending on what you want to create. There is np.zeros, np.ones, np.empty, np.zeros_like, np.ones_like, and np.empty_like, and many others that create useful arrays such as np.linspace, and np.arange.

There are a number of "preferred" ways to preallocate numpy arrays depending on what you want to create. There is np.zeros, np.ones, np.empty, np.zeros_like, np.ones_like, and np.empty_like, and many others that create useful arrays such as np.linspace, and np.arange.

So

So

ar0 = np.linspace(10, 20, 16).reshape(4, 4)

is just fine if this comes closest to the ar0you desire.

is just fine if this comes closest to the ar0you desire.

However, to make the last column all 1's, I think the preferred way would be to just say

However, to make the last column all 1's, I think the preferred way would be to just say

ar0[:,-1]=1

Since the shape of ar0[:,-1]is (4,), the 1 is broadcastedto match this shape.

Since the shape of ar0[:,-1]is (4,), the 1 is broadcastedto match this shape.

回答by Justas

In cases where performance is important, np.emptyand np.zerosappear to be the fastest ways to initialize numpy arrays.

In cases where performance is important, np.emptyand np.zerosappear to be the fastest ways to initialize numpy arrays.

Below are test results for each method and a few others. Values are in seconds.

Below are test results for each method and a few others. Values are in seconds.

>>> timeit("np.empty(1000000)",number=1000, globals=globals())
0.033749611208094166
>>> timeit("np.zeros(1000000)",number=1000, globals=globals())
0.03421245135849915
>>> timeit("np.arange(0,1000000,1)",number=1000, globals=globals())
1.2212416112155324
>>> timeit("np.ones(1000000)",number=1000, globals=globals())
2.2877375495381145
>>> timeit("np.linspace(0,1000000,1000000)",number=1000, globals=globals())
3.0824269766860652