pandas 将 numpy 数组数组转换为二维数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50971123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
converty numpy array of arrays to 2d array
提问by Nate Stemen
I have a pandas series features
that has the following values (features.values
)
我有一个Pandas系列features
,它具有以下值 ( features.values
)
array([array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]),
array([0, 0, 0, ..., 0, 0, 0]), ...,
array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]),
array([0, 0, 0, ..., 0, 0, 0])], dtype=object)
Now I really want this to be recognized as matrix, but if I do
现在我真的希望它被识别为矩阵,但如果我这样做
>>> features.values.shape
(10000,)
rather than (10000, 3000)
which is what I would expect.
而不是(10000, 3000)
我所期望的。
How can I get this to be recognized as 2d rather than a 1d array with arrays as values. Also why does it not automatically detect it as a 2d array?
我怎样才能让它被识别为二维而不是一个以数组为值的一维数组。另外为什么它不会自动将其检测为二维数组?
回答by hpaulj
In response your comment question, let's compare 2 ways of creating an array
为了回应您的评论问题,让我们比较两种创建数组的方法
First make an array from a list of arrays (all same length):
首先从数组列表中创建一个数组(所有长度相同):
In [302]: arr = np.array([np.arange(3), np.arange(1,4), np.arange(10,13)])
In [303]: arr
Out[303]:
array([[ 0, 1, 2],
[ 1, 2, 3],
[10, 11, 12]])
The result is a 2d array of numbers.
结果是一个二维数字数组。
If instead we make an object dtype array, and fill it with arrays:
如果我们创建一个对象 dtype 数组,并用数组填充它:
In [304]: arr = np.empty(3,object)
In [305]: arr[:] = [np.arange(3), np.arange(1,4), np.arange(10,13)]
In [306]: arr
Out[306]:
array([array([0, 1, 2]), array([1, 2, 3]), array([10, 11, 12])],
dtype=object)
Notice that this display is like yours. This is, by design a 1d array. Like a list it contains pointers to arrays elsewhere in memory. Notice that it requires an extra construction step. The default behavior of np.array
is to create a multidimensional array where it can.
请注意,此显示与您的一样。这是,设计为一维数组。就像一个列表,它包含指向内存中其他地方的数组的指针。请注意,它需要一个额外的构建步骤。的默认行为np.array
是尽可能创建一个多维数组。
It takes extra effort to get around that. Likewise it takes some extra effort to undo that - to create the 2d numeric array.
解决这个问题需要额外的努力。同样,要撤消它需要一些额外的努力 - 创建 2d 数值数组。
Simply calling np.array
on it does not change the structure.
简单地调用np.array
它不会改变结构。
In [307]: np.array(arr)
Out[307]:
array([array([0, 1, 2]), array([1, 2, 3]), array([10, 11, 12])],
dtype=object)
stack
does change it to 2d. stack
treats it as a list of arrays, which it joins on a new axis.
stack
确实将其更改为 2d。 stack
将它视为一个数组列表,它连接到一个新轴上。
In [308]: np.stack(arr)
Out[308]:
array([[ 0, 1, 2],
[ 1, 2, 3],
[10, 11, 12]])