Python 展平 NumPy 数组列表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33711985/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Flattening a list of NumPy arrays?
提问by Jerry Zhang
It appears that I have data in the format of a list of NumPy arrays (type() = np.ndarray
):
看来我有 NumPy 数组 ( type() = np.ndarray
)列表格式的数据:
[array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]),
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]),
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]),
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]),
array([[ 0.00353654]])]
I am trying to put this into a polyfit function:
我正在尝试将其放入 polyfit 函数中:
m1 = np.polyfit(x, y, deg=2)
However, it returns the error: TypeError: expected 1D vector for x
但是,它返回错误: TypeError: expected 1D vector for x
I assume I need to flatten my data into something like:
我假设我需要将我的数据展平成类似的东西:
[0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654 ...]
I have tried a list comprehension which usually works on lists of lists, but this as expected has not worked:
我尝试了一个列表理解,它通常适用于列表列表,但正如预期的那样没有奏效:
[val for sublist in risks for val in sublist]
What would be the best way to do this?
什么是最好的方法来做到这一点?
采纳答案by Divakar
You could use numpy.concatenate
, which as the name suggests, basically concatenates all the elements of such an input list into a single NumPy array, like so -
您可以使用numpy.concatenate
,顾名思义,它基本上将此类输入列表的所有元素连接到一个 NumPy 数组中,如下所示 -
import numpy as np
out = np.concatenate(input_list).ravel()
If you wish the final output to be a list, you can extend the solution, like so -
如果您希望最终输出是一个列表,您可以扩展解决方案,就像这样 -
out = np.concatenate(input_list).ravel().tolist()
Sample run -
样品运行 -
In [24]: input_list
Out[24]:
[array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]])]
In [25]: np.concatenate(input_list).ravel()
Out[25]:
array([ 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
0.00353654, 0.00353654, 0.00353654])
Convert to list -
转换为列表 -
In [26]: np.concatenate(input_list).ravel().tolist()
Out[26]:
[0.00353654,
0.00353654,
0.00353654,
0.00353654,
0.00353654,
0.00353654,
0.00353654,
0.00353654,
0.00353654,
0.00353654,
0.00353654,
0.00353654,
0.00353654]
回答by zsatter14
I came across this same issue and found a solution that combines 1-D numpy arrays of variable length:
我遇到了同样的问题,并找到了一个结合了可变长度的一维 numpy 数组的解决方案:
np.column_stack(input_list).ravel()
See numpy.column_stackfor more info.
有关更多信息,请参阅numpy.column_stack。
Example with variable-length arrays with your example data:
带有示例数据的可变长度数组示例:
In [135]: input_list
Out[135]:
[array([[ 0.00353654, 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654]]),
array([[ 0.00353654, 0.00353654, 0.00353654]])]
In [136]: [i.size for i in input_list] # variable size arrays
Out[136]: [2, 1, 1, 3]
In [137]: np.column_stack(input_list).ravel()
Out[137]:
array([ 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
0.00353654, 0.00353654])
Note: Only tested on Python 2.7.12
注意:仅在 Python 2.7.12 上测试
回答by ayorgo
Can also be done by
也可以通过
np.array(list_of_arrays).flatten().tolist()
resulting in
导致
[0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654]
Update
更新
As @aydow points out in the comments, using numpy.ndarray.ravel
can be faster if one doesn't care about getting a copy or a view
正如@aydow 在评论中指出的那样,numpy.ndarray.ravel
如果不关心获取副本或视图,使用速度会更快
np.array(list_of_arrays).ravel()
Although, according to docs
虽然,根据文档
When a view is desired in as many cases as possible,
arr.reshape(-1)
may be preferable.
当在尽可能多的情况下需要视图时,
arr.reshape(-1)
可能更可取。
In other words
换句话说
np.array(list_of_arrays).reshape(-1)
The initial suggestionof mine was to use numpy.ndarray.flatten
that returns a copy every timewhich affects performance.
在最初的建议我的是使用numpy.ndarray.flatten
的是回报的副本,每次会影响性能。
Let's now see how the time complexityof the above-listed solutions compares using perfplot
package for a setup similar to the one of the OP
现在让我们看看上面列出的解决方案的时间复杂度如何比较使用perfplot
包进行类似于 OP 之一的设置
import perfplot
perfplot.show(
setup=lambda n: np.random.rand(n, 2),
kernels=[lambda a: a.ravel(),
lambda a: a.flatten(),
lambda a: a.reshape(-1)],
labels=['ravel', 'flatten', 'reshape'],
n_range=[2**k for k in range(16)],
xlabel='N')
Here flatten
demonstrates piecewise linear complexity which can be reasonably explained by it making a copy of the initial array compare to constant complexities of ravel
and reshape
that return a view.
这里flatten
演示了分段线性复杂度,可以通过将初始数组的副本ravel
与reshape
返回视图的常量复杂度进行比较来合理解释。
It's also worth noting that, quite predictably, converting the outputs .tolist()
evens out the performance of all three to equally linear.
还值得注意的是,可以预见的是,将输出转换.tolist()
为所有三个的性能均等线性。
回答by kmario23
Another simple approach would be to use numpy.hstack()
followed by removing the singleton dimension using squeeze()
as in:
另一种简单的方法是使用,numpy.hstack()
然后使用如下方式删除单例维度squeeze()
:
In [61]: np.hstack(list_of_arrs).squeeze()
Out[61]:
array([0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
0.00353654, 0.00353654, 0.00353654])
回答by Tim Skov Jacobsen
Another way using itertools
for flattening the array:
itertools
用于展平数组的另一种方法:
import itertools
# Recreating array from question
a = [np.array([[0.00353654]])] * 13
# Make an iterator to yield items of the flattened list and create a list from that iterator
flattened = list(itertools.chain.from_iterable(a))
This solution should be very fast, see https://stackoverflow.com/a/408281/5993892for more explanation.
此解决方案应该非常快,请参阅https://stackoverflow.com/a/408281/5993892了解更多说明。
If the resulting data structure should be a numpy
array instead, use numpy.fromiter()
to exhaust the iterator into an array:
如果结果数据结构应该是一个numpy
数组,请使用numpy.fromiter()
将迭代器耗尽到一个数组中:
# Make an iterator to yield items of the flattened list and create a numpy array from that iterator
flattened_array = np.fromiter(itertools.chain.from_iterable(a), float)
Docs for itertools.chain.from_iterable()
:https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable
文档itertools.chain.from_iterable()
:https : //docs.python.org/3/library/itertools.html#itertools.chain.from_iterable
Docs for numpy.fromiter()
:https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html
文档numpy.fromiter()
:https : //docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html