Python numpy 数组行主要和列主要

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20341614/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:15:05  来源:igfitidea点击:

numpy array row major and column major

pythonarraysnumpy

提问by jmlopez

I'm having trouble understanding how numpystores its data. Consider the following:

我无法理解如何numpy存储其数据。考虑以下:

>>> import numpy as np
>>> a = np.ndarray(shape=(2,3), order='F')
>>> for i in xrange(6): a.itemset(i, i+1)
... 
>>> a
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])
>>> a.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

This says that ais column major (F_CONTIGUOUS) thus, internally, ashould look like the following:

这表示这a是列主要 ( F_CONTIGUOUS) 因此,在内部,a应该如下所示:

[1, 4, 2, 5, 3, 6]

This is just what it is stated in in this glossary. What is confusing me is that if I try to to access the data of ain a linear fashion instead I get:

这正是本词汇表中的表述。令我困惑的是,如果我尝试以a线性方式访问数据,我会得到:

>>> for i in xrange(6): print a.item(i)
... 
1.0
2.0
3.0
4.0
5.0
6.0

At this point I'm not sure what the F_CONTIGUOUSflag tells us since it does not honor the ordering. Apparently everything in python is row major and when we want to iterate in a linear fashion we can use the iterator flat.

在这一点上,我不确定F_CONTIGUOUS标志告诉我们什么,因为它不遵守顺序。显然,python 中的所有内容都是行主要的,当我们想要以线性方式进行迭代时,我们可以使用迭代器flat

The question is the following:given that we have a list of numbers, say: 1, 2, 3, 4, 5, 6, how can we create a numpyarray of shape (2, 3)in column major order? That is how can I get a matrix that looks like this

问题如下:假设我们有一个数字列表,例如:1, 2, 3, 4, 5, 6,我们如何以列主要顺序创建一个numpy形状数组(2, 3)?那就是我怎样才能得到一个看起来像这样的矩阵

array([[ 1.,  3.,  5.],
       [ 2.,  4.,  6.]])

I would really like to be able to iterate linearly over the list and place them into the newly created ndarray. The reason for this is because I will be reading files of multidimensional arrays set in column major order.

我真的很希望能够对列表进行线性迭代并将它们放入新创建的ndarray. 这样做的原因是因为我将读取按列主要顺序设置的多维数组文件。

采纳答案by Kill Console

The numpy stores data in row major order.

numpy 按行主要顺序存储数据。

>>> a = np.array([[1,2,3,4], [5,6,7,8]])
>>> a.shape
(2, 4)
>>> a.shape = 4,2
>>> a
array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

If you change the shape, the order of data do not change.

如果改变形状,数据的顺序不会改变。

If you add a 'F', you can get what you want.

如果你添加一个'F',你可以得到你想要的。

>>> b
array([1, 2, 3, 4, 5, 6])
>>> c = b.reshape(2,3,order='F')
>>> c
array([[1, 3, 5],
       [2, 4, 6]])

回答by Bi Rico

In general, numpy uses order to describe the memory layout, but the python behavior of the arrays should be consistent regardless of the memory layout. I think you can get the behavior you want using views. A view is an array that shares memory with another array. For example:

一般来说,numpy 使用 order 来描述内存布局,但是无论内存布局如何,数组的 python 行为都应该是一致的。我认为您可以使用视图获得所需的行为。视图是与另一个数组共享内存的数组。例如:

import numpy as np

a = np.arange(1, 6 + 1)
b = a.reshape(3, 2).T

a[1] = 99
print b
# [[ 1  3  5]
#  [99  4  6]]

Hope that helps.

希望有帮助。

回答by Matt Hancock

Your question has been answered, but I thought I would add this to explain your observations regarding, "At this point I'm not sure what the F_CONTIGUOUSflag tells us since it does not honor the ordering."

您的问题已得到解答,但我想我会添加这一点来解释您对“此时我不确定F_CONTIGUOUS标志告诉我们什么,因为它不遵守顺序”的看法。



The itemmethod doesn't directly access the data like you think it does. To do this, you should access the dataattribute, which gives you the byte string.

item方法不会像您认为的那样直接访问数据。为此,您应该访问data为您提供字节字符串的属性。

An example:

一个例子:

c = np.array([[1,2,3],
              [4,6,7]], order='C')

f = np.array([[1,2,3],
              [4,6,7]], order='F')

Observe

观察

print c.flags.c_contiguous, f.flags.f_contiguous
# True, True

and

print c.nbytes == len(c.data)
# True

Now let's print the contiguous data for both:

现在让我们打印两者的连续数据:

nelements = np.prod(c.shape)
bsize = c.dtype.itemsize # should be 8 bytes for 'int64'
for i in range(nelements):
    bnum = c.data[i*bsize : (i+1)*bsize] # The element as a byte string.
    print np.fromstring(bnum, dtype=c.dtype)[0], # Convert to number.

This prints:

这打印:

1 2 3 4 6 7

which is what we expect since cis order 'C', i.e., its data is stored row-major contiguous.

这是我们期望的,因为c是 order 'C',即它的数据存储在行优先连续的。

On the other hand,

另一方面,

nelements = np.prod(f.shape)
bsize = f.dtype.itemsize # should be 8 bytes for 'int64'
for i in range(nelements):
    bnum = f.data[i*bsize : (i+1)*bsize] # The element as a byte string.
    print np.fromstring(bnum, dtype=f.dtype)[0], # Convert to number.

prints

印刷

1 4 2 6 3 7

which, again, is what we expect to see since f's data is stored column-major contiguous.

这也是我们期望看到的,因为f的数据是以列为主连续存储的。

回答by cfh

Here is a simple way to print the data in memory order, by using the ravel()function:

这是使用该ravel()函数按内存顺序打印数据的简单方法:

>>> import numpy as np
>>> a = np.ndarray(shape=(2,3), order='F')
>>> for i in range(6): a.itemset(i, i+1)

>>> print(a.ravel(order='K'))
[ 1.  4.  2.  5.  3.  6.]

This confirms that the array is stored in Fortran order.

这确认数组是以 Fortran 顺序存储的。

回答by KamKam

Wanted to add this in the comments but my rep is too low:

想在评论中添加这个,但我的代表太低了:

While Kill Console's answer gave the OP's required solution, I think it's important to note that as stated in the numpy.reshape() documentation (https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html):

虽然 Kill Console 的回答给出了 OP 所需的解决方案,但我认为重要的是要注意 numpy.reshape() 文档 ( https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape) 中所述。 html):

Note there is no guarantee of the memory layout (C- or Fortran- contiguous) of the returned array.

请注意,无法保证返回数组的内存布局(C 或 Fortran 连续)。

so even if the view is column-wise, the data itself may not be, which may lead to inefficiencies in calculations which benefit from the data being stored column-wise in memory. Perhaps:

因此,即使视图是按列进行的,数据本身也可能不是,这可能会导致计算效率低下,这会受益于数据按列存储在内存中。也许:

a = np.array(np.array([1, 2, 3, 4, 5, 6]).reshape(2,3,order='F'), order='F')

provides more of a guarantee that the data is stored column-wise (see order argument description at https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html).

提供了更多的数据按列存储的保证(请参阅https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html 上的订单参数说明)。