Python 如何制作具有不同行大小的多维numpy数组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3386259/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:46:20  来源:igfitidea点击:

How to make a multidimension numpy array with a varying row size?

pythonarraysnumpy

提问by dzhelil

I would like to create a two dimensional numpy array of arrays that has a different number of elements on each row.

我想创建一个二维 numpy 数组数组,每行都有不同数量的元素。

Trying

cells = numpy.array([[0,1,2,3], [2,3,4]])

gives an error

给出错误

ValueError: setting an array element with a sequence.

采纳答案by Philipp

While Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.

虽然 Numpy 知道任意对象的数组,但它针对具有固定维度的同构数字数组进行了优化。如果您确实需要数组数组,最好使用嵌套列表。但是根据数据的预期用途,不同的数据结构可能会更好,例如,如果您有一些无效的数据点,则使用掩码数组。

If you really want flexible Numpy arrays, use something like this:

如果您真的想要灵活的 Numpy 数组,请使用以下内容:

numpy.array([[0,1,2,3], [2,3,4]], dtype=object)

However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).

但是,这将创建一个存储对列表的引用的一维数组,这意味着您将失去 Numpy 的大部分优势(向量处理、局部性、切片等)。

回答by tom10

This isn't well supported in Numpy (by definition, almost everywhere, a "two dimensional array" has all rows of equal length). A Python list of Numpy arrays may be a good solution for you, as this way you'll get the advantages of Numpy where you can use them:

这在 Numpy 中没有得到很好的支持(根据定义,几乎在任何地方,“二维数组”的所有行的长度都相等)。Numpy 数组的 Python 列表对您来说可能是一个很好的解决方案,因为这样您将获得 Numpy 的优势,您可以在其中使用它们:

cells = [numpy.array(a) for a in [[0,1,2,3], [2,3,4]]]

回答by calocedrus

We are now almost 7 years after the question was asked, and your code

我们现在已经问了这个问题将近 7 年了,你的代码

cells = numpy.array([[0,1,2,3], [2,3,4]])

executed in numpy 1.12.0, python 3.5, doesn't produce any error and cellscontains:

在 numpy 1.12.0,python 3.5 中执行,不会产生任何错误并 cells包含:

array([[0, 1, 2, 3], [2, 3, 4]], dtype=object)

You access your cellselements as cells[0][2] # (=2).

cellscells[0][2] # (=2).

An alternative to tom10's solutionif you want to build your list of numpy arrays on the fly as new elements (i.e. arrays) become available is to use append:

如果您想在新元素(即数组)可用时动态构建 numpy 数组列表,则tom10 解决方案的替代方法是使用append

d = []                 # initialize an empty list
a = np.arange(3)       # array([0, 1, 2])
d.append(a)            # [array([0, 1, 2])]
b = np.arange(3,-1,-1) #array([3, 2, 1, 0])
d.append(b)            #[array([0, 1, 2]), array([3, 2, 1, 0])]

回答by Erik

Another option would be to store your arrays as one contiguous array and also store their sizes or offsets. This takes a little more conceptual thought around how to operate on your arrays, but a surprisingly large number of operations can be made to work as if you had a two dimensional array with different sizes. In the cases where they can't, then np.splitcan be used to create the list that calocedrus recommends. The easiest operations are ufuncs, because they require almost no modification. Here are some examples:

另一种选择是将您的数组存储为一个连续的数组,并存储它们的大小或偏移量。这需要更多关于如何操作数组的概念性思考,但是可以使大量的操作像具有不同大小的二维数组一样工作。在他们不能的情况下,np.split则可用于创建 calocedrus 推荐的列表。最简单的操作是 ufuncs,因为它们几乎不需要修改。这里有些例子:

cells_flat = numpy.array([0, 1, 2, 3, 2, 3, 4])
# One of these is required, it's pretty easy to convert between them,
# but having both makes the examples easy
cell_lengths = numpy.array([4, 3])
cell_starts = numpy.insert(cell_lengths[:-1].cumsum(), 0, 0)
cell_lengths2 = numpy.diff(numpy.append(cell_starts, cells_flat.size))
assert np.all(cell_lengths == cell_lengths2)

# Copy prevents shared memory
cells = numpy.split(cells_flat.copy(), cell_starts[1:])
# [array([0, 1, 2, 3]), array([2, 3, 4])]

numpy.array([x.sum() for x in cells])
# array([6, 9])
numpy.add.reduceat(cells_flat, cell_starts)
# array([6, 9])

[a + v for a, v in zip(cells, [1, 3])]
# [array([1, 2, 3, 4]), array([5, 6, 7])]
cells_flat + numpy.repeat([1, 3], cell_lengths)
# array([1, 2, 3, 4, 5, 6, 7])

[a.astype(float) / a.sum() for a in cells]
# [array([ 0.        ,  0.16666667,  0.33333333,  0.5       ]),
#  array([ 0.22222222,  0.33333333,  0.44444444])]
cells_flat.astype(float) / np.add.reduceat(cells_flat, cell_starts).repeat(cell_lengths)
# array([ 0.        ,  0.16666667,  0.33333333,  0.5       ,  0.22222222,
#         0.33333333,  0.44444444])

def complex_modify(array):
    """Some complicated function that modifies array

    pretend this is more complex than it is"""
    array *= 3

for arr in cells:
    complex_modify(arr)
cells
# [array([0, 3, 6, 9]), array([ 6,  9, 12])]
for arr in numpy.split(cells_flat, cell_starts[1:]):
    complex_modify(arr)
cells_flat
# array([ 0,  3,  6,  9,  6,  9, 12])

回答by Roberto Vázquez Lucerga

In numpy 1.14.3, using append:

在 numpy 1.14.3 中,使用附加:

d = []                 # initialize an empty list
a = np.arange(3)       # array([0, 1, 2])
d.append(a)            # [array([0, 1, 2])]
b = np.arange(3,-1,-1) #array([3, 2, 1, 0])
d.append(b)            #[array([0, 1, 2]), array([3, 2, 1, 0])]

what you get an list of arrays (that can be of different lengths) and you can do operations like d[0].mean(). On the other hand,

你得到一个数组列表(可以是不同的长度),你可以做类似的操作d[0].mean()。另一方面,

cells = numpy.array([[0,1,2,3], [2,3,4]])

results in an array of lists.

结果是一个列表数组。

You may want to do this:

你可能想要这样做:

a1 = np.array([1,2,3])
a2 = np.array([3,4])
a3 = np.array([a1,a2])
a3 # array([array([1, 2, 3]), array([3, 4])], dtype=object)
type(a3) # numpy.ndarray
type(a2) # numpy.ndarray