Python 连续数组和非连续数组有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26998223/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:16:46  来源:igfitidea点击:

What is the difference between contiguous and non-contiguous arrays?

pythonarraysnumpymemory

提问by jdeng

In the numpy manualabout the reshape() function, it says

在关于 reshape() 函数的numpy 手册中,它说

>>> a = np.zeros((10, 2))
# A transpose make the array non-contiguous
>>> b = a.T
# Taking a view makes it possible to modify the shape without modifying the
# initial object.
>>> c = b.view()
>>> c.shape = (20)
AttributeError: incompatible shape for a non-contiguous array

My questions are:

我的问题是:

  1. What are continuous and noncontiguous arrays? Is it similar to the contiguous memory block in C like What is a contiguous memory block?
  2. Is there any performance difference between these two? When should we use one or the other?
  3. Why does transpose make the array non-contiguous?
  4. Why does c.shape = (20)throws an error incompatible shape for a non-contiguous array?
  1. 什么是连续数组和非连续数组?它是否类似于 C 中的连续内存块,例如什么是连续内存块?
  2. 这两者之间有什么性能差异吗?我们什么时候应该使用其中一个?
  3. 为什么转置会使数组不连续?
  4. 为什么会c.shape = (20)抛出错误incompatible shape for a non-contiguous array

Thanks for your answer!

感谢您的回答!

采纳答案by Alex Riley

A contiguous array is just an array stored in an unbroken block of memory: to access the next value in the array, we just move to the next memory address.

连续数组只是存储在完整内存块中的数组:要访问数组中的下一个值,我们只需移动到下一个内存地址。

Consider the 2D array arr = np.arange(12).reshape(3,4). It looks like this:

考虑二维数组arr = np.arange(12).reshape(3,4)。它看起来像这样:

enter image description here

在此处输入图片说明

In the computer's memory, the values of arrare stored like this:

在计算机的内存中, 的值是arr这样存储的:

enter image description here

在此处输入图片说明

This means arris a C contiguousarray because the rowsare stored as contiguous blocks of memory. The next memory address holds the next row value on that row. If we want to move down a column, we just need to jump over three blocks (e.g. to jump from 0 to 4 means we skip over 1,2 and 3).

这意味着arr是一个C 连续数组,因为存储为连续的内存块。下一个内存地址保存该行的下一行值。如果我们想向下移动一列,我们只需要跳过三个块(例如,从 0 跳转到 4 意味着我们跳过 1,2 和 3)。

Transposing the array with arr.Tmeans that C contiguity is lost because adjacent row entries are no longer in adjacent memory addresses. However, arr.Tis Fortran contiguoussince the columnsare in contiguous blocks of memory:

将数组转置arr.T意味着 C 连续性丢失,因为相邻的行条目不再位于相邻的内存地址中。但是,Fortranarr.T连续的,因为位于连续的内存块中:

enter image description here

在此处输入图片说明



Performance-wise, accessing memory addresses which are next to each other is very often faster than accessing addresses which are more "spread out" (fetching a value from RAM could entail a number of neighbouring addresses being fetched and cached for the CPU.) This means that operations over contiguous arrays will often be quicker.

在性能方面,访问彼此相邻的内存地址通常比访问更“分散”的地址更快(从 RAM 中获取值可能需要为 CPU 获取和缓存许多相邻地址。)意味着对连续数组的操作通常会更快。

As a consequence of C contiguous memory layout, row-wise operations are usually faster than column-wise operations. For example, you'll typically find that

作为 C 连续内存布局的结果,行操作通常比列操作快。例如,您通常会发现

np.sum(arr, axis=1) # sum the rows

is slightly faster than:

略快于:

np.sum(arr, axis=0) # sum the columns

Similarly, operations on columns will be slightly faster for Fortran contiguous arrays.

同样,对于 Fortran 连续数组,对列的操作会稍微快一些。



Finally, why can't we flatten the Fortran contiguous array by assigning a new shape?

最后,为什么我们不能通过分配新形状来展平 Fortran 连续数组?

>>> arr2 = arr.T
>>> arr2.shape = 12
AttributeError: incompatible shape for a non-contiguous array

In order for this to be possible NumPy would have to put the rows of arr.Ttogether like this:

为了使这成为可能,NumPy 必须arr.T像这样将行放在一起:

enter image description here

在此处输入图片说明

(Setting the shapeattribute directly assumes C order - i.e. NumPy tries to perform the operation row-wise.)

shape直接设置属性假定 C 顺序 - 即 NumPy 尝试按行执行操作。)

This is impossible to do. For any axis, NumPy needs to have a constantstride length (the number of bytes to move) to get to the next element of the array. Flattening arr.Tin this way would require skipping forwards and backwards in memory to retrieve consecutive values of the array.

这是不可能做到的。对于任何轴,NumPy 需要有一个恒定的步幅长度(要移动的字节数)才能到达数组的下一个元素。arr.T以这种方式展平将需要在内存中向前和向后跳过以检索数组的连续值。

If we wrote arr2.reshape(12)instead, NumPy would copy the values of arr2 into a new block of memory (since it can't return a view on to the original data for this shape).

如果我们arr2.reshape(12)改写,NumPy 会将 arr2 的值复制到新的内存块中(因为它无法返回该形状的原始数据的视图)。

回答by hpaulj

Maybe this example with 12 different array values will help:

也许这个有 12 个不同数组值的例子会有所帮助:

In [207]: x=np.arange(12).reshape(3,4).copy()

In [208]: x.flags
Out[208]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  ...
In [209]: x.T.flags
Out[209]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  ...

The C ordervalues are in the order that they were generated in. The transposed ones are not

这些C order值按照它们生成的顺序排列。转置的不是

In [212]: x.reshape(12,)   # same as x.ravel()
Out[212]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [213]: x.T.reshape(12,)
Out[213]: array([ 0,  4,  8,  1,  5,  9,  2,  6, 10,  3,  7, 11])

You can get 1d views of both

您可以获得两者的 1d 视图

In [214]: x1=x.T

In [217]: x.shape=(12,)

the shape of xcan also be changed.

的形状x也可以改变。

In [220]: x1.shape=(12,)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-220-cf2b1a308253> in <module>()
----> 1 x1.shape=(12,)

AttributeError: incompatible shape for a non-contiguous array

But the shape of the transpose cannot be changed. The datais still in the 0,1,2,3,4...order, which can't be accessed accessed as 0,4,8...in a 1d array.

但是转置的形状不能改变。在data仍处于0,1,2,3,4...顺序,这不能被访问访问如0,4,8...在一维数组。

But a copy of x1can be changed:

但是x1可以更改的副本:

In [227]: x2=x1.copy()

In [228]: x2.flags
Out[228]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  ...
In [229]: x2.shape=(12,)

Looking at stridesmight also help. A strides is how far (in bytes) it has to step to get to the next value. For a 2d array, there will be be 2 stride values:

看看strides也可能有帮助。步幅是它需要多远(以字节为单位)才能到达下一个值。对于二维数组,将有 2 个步幅值:

In [233]: x=np.arange(12).reshape(3,4).copy()

In [234]: x.strides
Out[234]: (16, 4)

To get to the next row, step 16 bytes, next column only 4.

要到达下一行,步 16 个字节,下一列只有 4 个。

In [235]: x1.strides
Out[235]: (4, 16)

Transpose just switches the order of the strides. The next row is only 4 bytes- i.e. the next number.

转置只是切换步幅的顺序。下一行只有 4 个字节——即下一个数字。

In [236]: x.shape=(12,)

In [237]: x.strides
Out[237]: (4,)

Changing the shape also changes the strides - just step through the buffer 4 bytes at a time.

改变形状也会改变步幅 - 一次只遍历缓冲区 4 个字节。

In [238]: x2=x1.copy()

In [239]: x2.strides
Out[239]: (12, 4)

Even though x2looks just like x1, it has its own data buffer, with the values in a different order. The next column is now 4 bytes over, while the next row is 12 (3*4).

尽管x2看起来像x1,但它有自己的数据缓冲区,值的顺序不同。下一列现在超过 4 个字节,而下一行是 12 (3*4)。

In [240]: x2.shape=(12,)

In [241]: x2.strides
Out[241]: (4,)

And as with x, changing the shape to 1d reduces the strides to (4,).

与 一样x,将形状更改为 1d 会将步幅减少到(4,)

For x1, with data in the 0,1,2,...order, there isn't a 1d stride that would give 0,4,8....

对于x1,数据按0,1,2,...顺序排列,没有 1d 步幅会给出0,4,8...

__array_interface__is another useful way of displaying array information:

__array_interface__是另一种显示数组信息的有用方法:

In [242]: x1.__array_interface__
Out[242]: 
{'strides': (4, 16),
 'typestr': '<i4',
 'shape': (4, 3),
 'version': 3,
 'data': (163336056, False),
 'descr': [('', '<i4')]}

The x1data buffer address will be same as for x, with which it shares the data. x2has a different buffer address.

x1数据缓冲器地址将是相同x,同它的数据。 x2有不同的缓冲区地址。

You could also experiment with adding a order='F'parameter to the copyand reshapecommands.

您还可以尝试向和命令添加order='F'参数。copyreshape