Python 展平 NumPy 数组列表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33711985/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:50:19  来源:igfitidea点击:

Flattening a list of NumPy arrays?

pythonarraysnumpylist-comprehension

提问by Jerry Zhang

It appears that I have data in the format of a list of NumPy arrays (type() = np.ndarray):

看来我有 NumPy 数组 ( type() = np.ndarray)列表格式的数据:

[array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]),
array([[ 0.00353654]])]

I am trying to put this into a polyfit function:

我正在尝试将其放入 polyfit 函数中:

m1 = np.polyfit(x, y, deg=2)

However, it returns the error: TypeError: expected 1D vector for x

但是,它返回错误: TypeError: expected 1D vector for x

I assume I need to flatten my data into something like:

我假设我需要将我的数据展平成类似的东西:

[0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654 ...]

I have tried a list comprehension which usually works on lists of lists, but this as expected has not worked:

我尝试了一个列表理解,它通常适用于列表列表,但正如预期的那样没有奏效:

[val for sublist in risks for val in sublist]

What would be the best way to do this?

什么是最好的方法来做到这一点?

采纳答案by Divakar

You could use numpy.concatenate, which as the name suggests, basically concatenates all the elements of such an input list into a single NumPy array, like so -

您可以使用numpy.concatenate,顾名思义,它基本上将此类输入列表的所有元素连接到一个 NumPy 数组中,如下所示 -

import numpy as np
out = np.concatenate(input_list).ravel()

If you wish the final output to be a list, you can extend the solution, like so -

如果您希望最终输出是一个列表,您可以扩展解决方案,就像这样 -

out = np.concatenate(input_list).ravel().tolist()

Sample run -

样品运行 -

In [24]: input_list
Out[24]: 
[array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]])]

In [25]: np.concatenate(input_list).ravel()
Out[25]: 
array([ 0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654,  0.00353654])

Convert to list -

转换为列表 -

In [26]: np.concatenate(input_list).ravel().tolist()
Out[26]: 
[0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654]

回答by zsatter14

I came across this same issue and found a solution that combines 1-D numpy arrays of variable length:

我遇到了同样的问题,并找到了一个结合了可变长度的一维 numpy 数组的解决方案:

np.column_stack(input_list).ravel()

See numpy.column_stackfor more info.

有关更多信息,请参阅numpy.column_stack

Example with variable-length arrays with your example data:

带有示例数据的可变长度数组示例:

In [135]: input_list
Out[135]: 
[array([[ 0.00353654,  0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654,  0.00353654,  0.00353654]])]

In [136]: [i.size for i in input_list]    # variable size arrays
Out[136]: [2, 1, 1, 3]

In [137]: np.column_stack(input_list).ravel()
Out[137]: 
array([ 0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654])

Note: Only tested on Python 2.7.12

注意:仅在 Python 2.7.12 上测试

回答by ayorgo

Can also be done by

也可以通过

np.array(list_of_arrays).flatten().tolist()

resulting in

导致

[0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654]


Update

更新

As @aydow points out in the comments, using numpy.ndarray.ravelcan be faster if one doesn't care about getting a copy or a view

正如@aydow 在评论中指出的那样,numpy.ndarray.ravel如果不关心获取副本或视图,使用速度会更快

np.array(list_of_arrays).ravel()

Although, according to docs

虽然,根据文档

When a view is desired in as many cases as possible, arr.reshape(-1)may be preferable.

当在尽可能多的情况下需要视图时,arr.reshape(-1)可能更可取。

In other words

换句话说

np.array(list_of_arrays).reshape(-1)

The initial suggestionof mine was to use numpy.ndarray.flattenthat returns a copy every timewhich affects performance.

最初的建议我的是使用numpy.ndarray.flatten的是回报的副本,每次会影响性能。

Let's now see how the time complexityof the above-listed solutions compares using perfplotpackage for a setup similar to the one of the OP

现在让我们看看上面列出的解决方案的时间复杂度如何比较使用perfplot包进行类似于 OP 之一的设置

import perfplot

perfplot.show(
    setup=lambda n: np.random.rand(n, 2),
    kernels=[lambda a: a.ravel(),
             lambda a: a.flatten(),
             lambda a: a.reshape(-1)],
    labels=['ravel', 'flatten', 'reshape'],
    n_range=[2**k for k in range(16)],
    xlabel='N')

enter image description here

在此处输入图片说明

Here flattendemonstrates piecewise linear complexity which can be reasonably explained by it making a copy of the initial array compare to constant complexities of raveland reshapethat return a view.

这里flatten演示了分段线性复杂度,可以通过将初始数组的副本ravelreshape返回视图的常量复杂度进行比较来合理解释。

It's also worth noting that, quite predictably, converting the outputs .tolist()evens out the performance of all three to equally linear.

还值得注意的是,可以预见的是,将输出转换.tolist()为所有三个的性能均等线性。

回答by kmario23

Another simple approach would be to use numpy.hstack()followed by removing the singleton dimension using squeeze()as in:

另一种简单的方法是使用,numpy.hstack()然后使用如下方式删除单例维度squeeze()

In [61]: np.hstack(list_of_arrs).squeeze()
Out[61]: 
array([0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
       0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
       0.00353654, 0.00353654, 0.00353654])

回答by Tim Skov Jacobsen

Another way using itertoolsfor flattening the array:

itertools用于展平数组的另一种方法:

import itertools

# Recreating array from question
a = [np.array([[0.00353654]])] * 13

# Make an iterator to yield items of the flattened list and create a list from that iterator
flattened = list(itertools.chain.from_iterable(a))

This solution should be very fast, see https://stackoverflow.com/a/408281/5993892for more explanation.

此解决方案应该非常,请参阅https://stackoverflow.com/a/408281/5993892了解更多说明。

If the resulting data structure should be a numpyarray instead, use numpy.fromiter()to exhaust the iterator into an array:

如果结果数据结构应该是一个numpy数组,请使用numpy.fromiter()将迭代器耗尽到一个数组中:

# Make an iterator to yield items of the flattened list and create a numpy array from that iterator
flattened_array = np.fromiter(itertools.chain.from_iterable(a), float)

Docs for itertools.chain.from_iterable():https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable

文档itertools.chain.from_iterable()https : //docs.python.org/3/library/itertools.html#itertools.chain.from_iterable

Docs for numpy.fromiter():https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html

文档numpy.fromiter()https : //docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html