Python numpy.array 形状 (R, 1) 和 (R,) 之间的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22053050/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between numpy.array shape (R, 1) and (R,)
提问by clwen
In numpy, some of the operations return in shape (R, 1)but some return (R,). This will make matrix multiplication more tedious since explicit reshapeis required. For example, given a matrix M, if we want to do numpy.dot(M[:,0], numpy.ones((1, R)))where Ris the number of rows (of course, the same issue also occurs column-wise). We will get matrices are not alignederror since M[:,0]is in shape (R,)but numpy.ones((1, R))is in shape (1, R).
在 中numpy,一些操作以形状返回,(R, 1)但一些返回(R,)。这将使矩阵乘法更加乏味,因为reshape需要显式。例如,给定一个 matrix M,如果我们想做numpy.dot(M[:,0], numpy.ones((1, R)))whereR是行数(当然,同样的问题也会发生在 column-wise )。我们会得到matrices are not aligned错误,因为M[:,0]is in shape(R,)但numpy.ones((1, R))is in shape (1, R)。
So my questions are:
所以我的问题是:
What's the difference between shape
(R, 1)and(R,). I know literally it's list of numbers and list of lists where all list contains only a number. Just wondering why not designnumpyso that it favors shape(R, 1)instead of(R,)for easier matrix multiplication.Are there better ways for the above example? Without explicitly reshape like this:
numpy.dot(M[:,0].reshape(R, 1), numpy.ones((1, R)))
shape
(R, 1)和(R,).和有什么不一样?我从字面上知道它是数字列表和列表列表,其中所有列表只包含一个数字。只是想知道为什么不设计numpy成有利于形状(R, 1)而不是(R,)更容易的矩阵乘法。上面的例子有更好的方法吗?没有像这样明确重塑:
numpy.dot(M[:,0].reshape(R, 1), numpy.ones((1, R)))
采纳答案by Gareth Rees
1. The meaning of shapes in NumPy
1. NumPy 中形状的含义
You write, "I know literally it's list of numbers and list of lists where all list contains only a number" but that's a bit of an unhelpful way to think about it.
你写道,“我从字面上知道它是数字列表和列表列表,其中所有列表只包含一个数字”,但这有点无助于思考它。
The best way to think about NumPy arrays is that they consist of two parts, a data bufferwhich is just a block of raw elements, and a viewwhich describes how to interpret the data buffer.
考虑 NumPy 数组的最佳方式是它们由两部分组成,一个是原始元素块的数据缓冲区,以及一个描述如何解释数据缓冲区的视图。
For example, if we create an array of 12 integers:
例如,如果我们创建一个包含 12 个整数的数组:
>>> a = numpy.arange(12)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Then aconsists of a data buffer, arranged something like this:
然后a由一个数据缓冲区组成,排列如下:
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
and a view which describes how to interpret the data:
以及描述如何解释数据的视图:
>>> a.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
>>> a.dtype
dtype('int64')
>>> a.itemsize
8
>>> a.strides
(8,)
>>> a.shape
(12,)
Here the shape(12,)means the array is indexed by a single index which runs from 0?to?11. Conceptually, if we label this single index i, the array alooks like this:
这里的形状(12,)意味着数组由从 0? 到?11 的单个索引进行索引。从概念上讲,如果我们标记这个单个索引i,数组a看起来像这样:
i= 0 1 2 3 4 5 6 7 8 9 10 11
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
If we reshapean array, this doesn't change the data buffer. Instead, it creates a new view that describes a different way to interpret the data. So after:
如果我们重塑一个数组,这不会改变数据缓冲区。相反,它创建了一个新视图,描述了解释数据的不同方式。所以之后:
>>> b = a.reshape((3, 4))
the array bhas the same data buffer as a, but now it is indexed by twoindices which run from 0?to?2 and 0?to?3 respectively. If we label the two indices iand j, the array blooks like this:
该数组b具有与 相同的数据缓冲区a,但现在它由两个索引索引,分别从 0?to?2 和 0?to?3 运行。如果我们标记两个索引i和j,则数组b如下所示:
i= 0 0 0 0 1 1 1 1 2 2 2 2
j= 0 1 2 3 0 1 2 3 0 1 2 3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
which means that:
意思就是:
>>> b[2,1]
9
You can see that the second index changes quickly and the first index changes slowly. If you prefer this to be the other way round, you can specify the orderparameter:
可以看到第二个索引变化很快,第一个索引变化很慢。如果您希望相反,您可以指定order参数:
>>> c = a.reshape((3, 4), order='F')
which results in an array indexed like this:
这导致数组索引如下:
i= 0 1 2 0 1 2 0 1 2 0 1 2
j= 0 0 0 1 1 1 2 2 2 3 3 3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
which means that:
意思就是:
>>> c[2,1]
5
It should now be clear what it means for an array to have a shape with one or more dimensions of size?1. After:
现在应该很清楚数组具有一个或多个尺寸的形状意味着什么?1。后:
>>> d = a.reshape((12, 1))
the array dis indexed by two indices, the first of which runs from 0?to?11, and the second index is always?0:
该数组d由两个索引索引,第一个索引从 0? 到?11,第二个索引始终为?0:
i= 0 1 2 3 4 5 6 7 8 9 10 11
j= 0 0 0 0 0 0 0 0 0 0 0 0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
and so:
所以:
>>> d[10,0]
10
A dimension of length?1 is "free" (in some sense), so there's nothing stopping you from going to town:
length?1 的维度是“自由的”(在某种意义上),所以没有什么能阻止你去镇上:
>>> e = a.reshape((1, 2, 1, 6, 1))
giving an array indexed like this:
给出一个这样索引的数组:
i= 0 0 0 0 0 0 0 0 0 0 0 0
j= 0 0 0 0 0 0 1 1 1 1 1 1
k= 0 0 0 0 0 0 0 0 0 0 0 0
l= 0 1 2 3 4 5 0 1 2 3 4 5
m= 0 0 0 0 0 0 0 0 0 0 0 0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
and so:
所以:
>>> e[0,1,0,0,0]
6
See the NumPy internals documentationfor more details about how arrays are implemented.
有关如何实现数组的更多详细信息,请参阅NumPy 内部文档。
2. What to do?
2. 怎么办?
Since numpy.reshapejust creates a new view, you shouldn't be scared about using it whenever necessary. It's the right tool to use when you want to index an array in a different way.
由于numpy.reshape只是创建了一个新视图,因此您不必害怕在必要时使用它。当您想以不同的方式索引数组时,它是正确的工具。
However, in a long computation it's usually possible to arrange to construct arrays with the "right" shape in the first place, and so minimize the number of reshapes and transposes. But without seeing the actual context that led to the need for a reshape, it's hard to say what should be changed.
但是,在长时间的计算中,通常可以首先安排构造具有“正确”形状的数组,从而最大限度地减少整形和转置的次数。但是如果没有看到导致需要重塑的实际背景,就很难说应该改变什么。
The example in your question is:
您问题中的示例是:
numpy.dot(M[:,0], numpy.ones((1, R)))
but this is not realistic. First, this expression:
但这并不现实。首先,这个表达式:
M[:,0].sum()
computes the result more simply. Second, is there really something special about column 0? Perhaps what you actually need is:
更简单地计算结果。其次,第 0 列真的有什么特别之处吗?也许你真正需要的是:
M.sum(axis=0)
回答by Evan
The difference between (R,)and (1,R)is literally the number of indices that you need to use. ones((1,R))is a 2-D array that happens to have only one row. ones(R)is a vector. Generally if it doesn't make sense for the variable to have more than one row/column, you should be using a vector, not a matrix with a singleton dimension.
(R,)和之间的区别(1,R)实际上是您需要使用的索引数量。 ones((1,R))是一个二维数组,恰好只有一行。 ones(R)是一个向量。通常,如果变量具有多个行/列没有意义,则应该使用向量,而不是具有单一维度的矩阵。
For your specific case, there are a couple of options:
对于您的具体情况,有几个选项:
1) Just make the second argument a vector. The following works fine:
1) 只需将第二个参数设为向量即可。以下工作正常:
np.dot(M[:,0], np.ones(R))
2) If you want matlab like matrix operations, use the class matrixinstead of ndarray. All matricies are forced into being 2-D arrays, and operator *does matrix multiplication instead of element-wise (so you don't need dot). In my experience, this is more trouble that it is worth, but it may be nice if you are used to matlab.
2)如果你想要像矩阵运算那样的 matlab,请使用类matrix而不是ndarray. 所有*矩阵都被强制为二维数组,并且运算符进行矩阵乘法而不是元素乘法(因此您不需要点)。以我的经验,这是值得的,但如果你习惯了matlab,它可能会更好。
回答by bogatron
1) The reason not to prefer a shape of (R, 1)over (R,)is that it unnecessarily complicates things. Besides, why would it be preferable to have shape (R, 1)by default for a length-R vector instead of (1, R)? It's better to keep it simple and be explicit when you require additional dimensions.
1)不喜欢(R, 1)over形状的原因(R,)是它不必要地使事情复杂化。此外,为什么(R, 1)默认情况下为长度-R 向量而不是具有形状更可取(1, R)?当您需要额外的维度时,最好保持简单和明确。
2) For your example, you are computing an outer product so you can do this without a reshapecall by using np.outer:
2)对于您的示例,您正在计算外积,因此您无需reshape调用即可使用np.outer:
np.outer(M[:,0], numpy.ones((1, R)))
回答by hpaulj
For its base array class, 2d arrays are no more special than 1d or 3d ones. There are some operations the preserve the dimensions, some that reduce them, other combine or even expand them.
对于它的基本数组类,2d 数组并不比 1d 或 3d 数组更特殊。有一些操作保留维度,一些操作减少它们,其他组合甚至扩展它们。
M=np.arange(9).reshape(3,3)
M[:,0].shape # (3,) selects one column, returns a 1d array
M[0,:].shape # same, one row, 1d array
M[:,[0]].shape # (3,1), index with a list (or array), returns 2d
M[:,[0,1]].shape # (3,2)
In [20]: np.dot(M[:,0].reshape(3,1),np.ones((1,3)))
Out[20]:
array([[ 0., 0., 0.],
[ 3., 3., 3.],
[ 6., 6., 6.]])
In [21]: np.dot(M[:,[0]],np.ones((1,3)))
Out[21]:
array([[ 0., 0., 0.],
[ 3., 3., 3.],
[ 6., 6., 6.]])
Other expressions that give the same array
给出相同数组的其他表达式
np.dot(M[:,0][:,np.newaxis],np.ones((1,3)))
np.dot(np.atleast_2d(M[:,0]).T,np.ones((1,3)))
np.einsum('i,j',M[:,0],np.ones((3)))
M1=M[:,0]; R=np.ones((3)); np.dot(M1[:,None], R[None,:])
MATLAB started out with just 2D arrays. Newer versions allow more dimensions, but retain the lower bound of 2. But you still have to pay attention to the difference between a row matrix and column one, one with shape (1,3)v (3,1). How often have you written [1,2,3].'? I was going to write row vectorand column vector, but with that 2d constraint, there aren't any vectors in MATLAB - at least not in the mathematical sense of vector as being 1d.
MATLAB 一开始只使用二维数组。较新的版本允许更多的维度,但保留了 2 的下限。但您仍然需要注意行矩阵和第一列之间的区别,第一列的形状为(1,3)v (3,1)。你多久写一次[1,2,3].'?我打算写row vectorand column vector,但是有了那个 2d 约束,MATLAB 中没有任何向量 - 至少在向量的数学意义上不是 1d。
Have you looked at np.atleast_2d(also _1d and _3d versions)?
你有没有看过np.atleast_2d(还有_1d和_3d版本)?
回答by Katie Jergens
The shape is a tuple. If there is only 1 dimension the shape will be one number and just blank after a comma. For 2+ dimensions, there will be a number after all the commas.
形状是一个元组。如果只有 1 个维度,则形状将是一个数字,并且在逗号后为空白。对于 2+ 维,所有逗号后面都会有一个数字。
# 1 dimension with 2 elements, shape = (2,).
# Note there's nothing after the comma.
z=np.array([ # start dimension
10, # not a dimension
20 # not a dimension
]) # end dimension
print(z.shape)
(2,)
(2,)
# 2 dimensions, each with 1 element, shape = (2,1)
w=np.array([ # start outer dimension
[10], # element is in an inner dimension
[20] # element is in an inner dimension
]) # end outer dimension
print(w.shape)
(2,1)
(2,1)
回答by Mikhail_Sam
There are a lot of good answers here already. But for me it was hard to find some example, where the shape or array can break all the program.
这里已经有很多好的答案了。但对我来说很难找到一些例子,其中形状或数组可以破坏所有程序。
So here is the one:
所以这是一个:
import numpy as np
a = np.array([1,2,3,4])
b = np.array([10,20,30,40])
from sklearn.linear_model import LinearRegression
regr = LinearRegression()
regr.fit(a,b)
This will fail with error:
这将失败并出现错误:
ValueError: Expected 2D array, got 1D array instead
ValueError:预期的二维数组,而是得到一维数组
but if we add reshapeto a:
但如果我们添加reshape到a:
a = np.array([1,2,3,4]).reshape(-1,1)
this works correctly!
这工作正常!

