Python M×N 形状的滑动窗口 numpy.ndarray

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15722324/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:47:21  来源:igfitidea点击:

sliding window of M-by-N shape numpy.ndarray

pythonnumpytime-seriessliding-window

提问by siamii

I have a numpy array of shape (6,2)

我有一个形状为 (6,2) 的 numpy 数组

[[00,01],
 [10,11],
 [20,21],
 [30,31],
 [40,41],
 [50,51]]

I need a sliding window with step size 1 and window size 3 likes this:

我需要一个步长为 1 和窗口大小为 3 的滑动窗口,如下所示:

[[00,01,10,11,20,21],
 [10,11,20,21,30,31],
 [20,21,30,31,40,41],
 [30,31,40,41,50,51]]

I'm looking for a numpy solution. If your solution could parametrize the the shape of the original array as well as the window size and step size, that'd great.

我正在寻找一个 numpy 解决方案。如果您的解决方案可以参数化原始数组的形状以及窗口大小和步长,那就太好了。

I found this related answer Using strides for an efficient moving average filterbut I don't see how to specify the stepsize there and how to collapse the window from the 3d to a continuous 2d array. Also this Rolling or sliding window iterator in Pythonbut that's in Python and I'm not sure how efficient that is. Also, it supports elements but does not join them together in the end if each element has multiple features.

我找到了这个相关的答案Using strides for an有效移动平均滤波器,但我没有看到如何在那里指定步长以及如何将窗口从 3d 折叠到连续的 2d 数组。还有这个在 Python中的滚动或滑动窗口迭代器,但那是在 Python 中,我不确定它的效率如何。此外,它支持元素,但如果每个元素具有多个特征,则最终不会将它们连接在一起。

采纳答案by askewchan

In [1]: import numpy as np

In [2]: a = np.array([[00,01], [10,11], [20,21], [30,31], [40,41], [50,51]])

In [3]: w = np.hstack((a[:-2],a[1:-1],a[2:]))

In [4]: w
Out[4]: 
array([[ 0,  1, 10, 11, 20, 21],
       [10, 11, 20, 21, 30, 31],
       [20, 21, 30, 31, 40, 41],
       [30, 31, 40, 41, 50, 51]])

You could write this in as a function as so:

你可以把它写成一个函数:

def window_stack(a, stepsize=1, width=3):
    n = a.shape[0]
    return np.hstack( a[i:1+n+i-width:stepsize] for i in range(0,width) )


This doesn't really depend on the shape of the original array, as long as a.ndim = 2. Note that I never use either lengths in the interactive version. The second dimension of the shape is irrelevant; each row can be as long as you want. Thanks to @Jaime's suggestion, you can do it without checking the shape at all:

这并不真正取决于原始数组的形状,只要a.ndim = 2. 请注意,我从未在交互式版本中使用任何一种长度。形状的第二个维度无关紧要;每一行可以任意长。感谢@Jaime 的建议,您可以在不检查形状的情况下进行操作:

def window_stack(a, stepsize=1, width=3):
    return np.hstack( a[i:1+i-width or None:stepsize] for i in range(0,width) )

回答by user42541

You can do a vectorized sliding window in numpy using fancy indexing.

您可以使用花哨的索引在 numpy 中进行矢量化滑动窗口。

>>> import numpy as np

>>> a = np.array([[00,01], [10,11], [20,21], [30,31], [40,41], [50,51]])

>>> a
array([[ 0,  1],
       [10, 11],
       [20, 21],                      #define our 2d numpy array
       [30, 31],
       [40, 41],
       [50, 51]])

>>> a = a.flatten()

>>> a
array([ 0,  1, 10, 11, 20, 21, 30, 31, 40, 41, 50, 51])    #flattened numpy array

>>> indexer = np.arange(6)[None, :] + 2*np.arange(4)[:, None]

>>> indexer
array([[ 0,  1,  2,  3,  4,  5],
       [ 2,  3,  4,  5,  6,  7],            #sliding window indices
       [ 4,  5,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10, 11]])

>>> a[indexer]
array([[ 0,  1, 10, 11, 20, 21],
       [10, 11, 20, 21, 30, 31],            #values of a over sliding window
       [20, 21, 30, 31, 40, 41],
       [30, 31, 40, 41, 50, 51]])

>>> np.sum(a[indexer], axis=1)
array([ 63, 123, 183, 243])         #sum of values in 'a' under the sliding window.

Explanation for what this code is doing.

解释此代码正在做什么。

The np.arange(6)[None, :]creates a row vector 0 through 6, and np.arange(4)[:, None]creates a column vector 0 through 4. This results in a 4x6 matrix where each row (six of them) represents a window, and the number of rows (four of them) represents the number of windows. The multiple of 2 makes the sliding window slide 2 units at a time which is necessary for sliding over each tuple. Using numpy array slicing you can pass the sliding window into the flattened numpy array and do aggregates on them like sum.

np.arange(6)[None, :]通过6创建一个行向量0,并且np.arange(4)[:, None]通过4。这导致了4x6的矩阵,其中每行(其中六)表示一个窗口,和行(其中四个)的数量创建一个列向量0表示的数视窗。2 的倍数使滑动窗口一次滑动 2 个单位,这是在每个元组上滑动所必需的。使用 numpy 数组切片,您可以将滑动窗口传递到展平的 numpy 数组中,并像 sum 一样对它们进行聚合。

回答by pbskumar

The solution is

解决办法是

np.lib.stride_tricks.as_strided(a, shape=(4,6), strides=(8,4)).

np.lib.stride_tricks.as_strided(a, shape=(4,6), strides=(8,4)).

Using strides is intuitive when you start thinking in terms of pointers/addresses.

当您开始考虑指针/地址时,使用 strides 是很直观的。

The as_strided()method has 3 arguments.

as_strided()方法有 3 个参数。

  1. data
  2. shape
  3. strides
  1. 数据
  2. 形状
  3. 大步

datais the array on which we would operate.

data是我们要操作的数组。

To use as_strided()for implementing sliding window functions, we must compute the shape of the output beforehand. In the question, (4,6) is the shape of output. If the dimensions are not correct, we end up reading garbage values. This is because we are accessing data by moving the pointer by a couple of bytes (depending on data type).

为了as_strided()用于实现滑动窗口函数,我们必须事先计算输出的形状。在问题中,(4,6) 是输出的形状。如果维度不正确,我们最终会读取垃圾值。这是因为我们通过将指针移动几个字节来访问数据(取决于数据类型)。

Determining the correct value of stridesis essential to get expected results. Before calculating strides, find out the memory occupied by each element using arr.strides[-1]. In this example, the memory occupied by one element is 4 bytes. Numpy arrays are created in row major fashion. The first element of the next row is right next to the last element of the current row.

确定步幅的正确值对于获得预期结果至关重要。在计算 strides 之前,使用 找出每个元素占用的内存arr.strides[-1]。在本例中,一个元素占用的内存为 4 个字节。Numpy 数组以行主要方式创建。下一行的第一个元素紧挨着当前行的最后一个元素。

Ex: 0 , 1 | 10, 11 | ...

例如: 0 , 1 | 10, 11 | ...

10 is right next to 1.

10 紧挨着 1。

Imagine the 2D array reshaped to 1D (This is acceptable as the data is stored in a row-major format). The first element of each row in the output is the odd indexed element in the 1D array. 0, 10, 20, 30, ..

想象一下将 2D 数组重塑为 1D(这是可以接受的,因为数据以行优先格式存储)。输出中每一行的第一个元素是一维数组中的奇数索引元素。0, 10, 20, 30, ..

Therefore, the number of steps in memory we need to take to move from 0 to 10, 10 to 20, and so on is 2 * mem size of element. Each row has a stride of 2 * 4bytes = 8. For a given row in the output, all the elements are adjacent to each other in our imaginary 1D array. To get the next element in a row, just take one stride equal to the size of an element. The value of column stride is 4 bytes.

因此,我们需要在内存中从 0 到 10、10 到 20 等移动的步骤数是2 * 元素的内存大小。每行的步幅为 2 * 4bytes = 8。对于输出中的给定行,我们虚构的一维数组中的所有元素都彼此相邻。要获取一行中的下一个元素,只需与元素大小相同的步幅即可。列跨度的值为 4 个字节。

Therefore, strides=(8,4)

所以, strides=(8,4)

An alternate explanation: The output has a shape of (4,6). Column stride 4. So, the first row elements start at index 0and have 6 elements each spaced 4 bytes apart. After the first row is collected, the second row starts 8 bytes away from the starting of the current row. The third row starts 8 bytes away from the starting point of the second row and so on.

另一种解释:输出的形状为 (4,6)。列步幅4。因此,第一行元素从 index 开始,0有 6 个元素,每个元素间隔 4 个字节。收集完第一行后,第二行开始,距离当前行的起始位置 8 个字节。第三行从第二行的起点开始 8 个字节,依此类推。

Shape determines the number of rows and columns we need. strides define the memory steps to start a row and collect a column element

形状决定了我们需要的行数和列数。strides 定义开始一行和收集列元素的内存步骤

回答by pylang

A short list comprehension is possible with more_itertools.windowed1:

可以使用1 进行简短的列表理解:more_itertools.windowed

Given

给定的

import numpy as np
import more_itertools as mit


a = [["00","01"],
     ["10","11"],
     ["20","21"],
     ["30","31"],
     ["40","41"],
     ["50","51"]]

b = np.array(a)

Code

代码

np.array([list(mit.flatten(w)) for w in mit.windowed(a, n=3)])

or

或者

np.array([[i for item in w for i in item] for w in mit.windowed(a, n=3)])

or

或者

np.array(list(mit.windowed(b.ravel(), n=6)))

Output

输出

array([['00', '01', '10', '11', '20', '21'],
       ['10', '11', '20', '21', '30', '31'],
       ['20', '21', '30', '31', '40', '41'],
       ['30', '31', '40', '41', '50', '51']], 
      dtype='<U2')

Sliding windows of size n=3are created and flattened. Note the default step size is more_itertools.windowed(..., step=1).

大小的滑动窗口n=3被创建和展平。请注意,默认步长为more_itertools.windowed(..., step=1)



Performance

表现

As an array, the accepted answer is fastest.

作为数组,接受的答案是最快的。

%timeit np.hstack((a[:-2], a[1:-1], a[2:]))
# 37.5 μs ± 1.88 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.hstack((b[:-2], b[1:-1], b[2:]))
# 12.9 μs ± 166 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.array([list(mit.flatten(w)) for w in mit.windowed(a, n=3)])
# 23.2 μs ± 1.73 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.array([[i for item in w for i in item] for w in mit.windowed(a, n=3)])
# 21.2 μs ± 999 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.array(list(mit.windowed(b.ravel(), n=6)))
# 43.4 μs ± 374 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

A third-party library that implements itertool recipesand many helpful tools.

实现itertool 配方和许多有用工具的第三方库。

回答by loretoparisi

This is a pure Python implementation:

这是一个纯 Python 实现:

def sliding_window(arr, window=3):
    i = iter(arr)
    a = []
    for e in range(0, window): a.append(next(i))
    yield a
    for e in i:
        a = a[1:] + [e]
        yield a

An example:

一个例子:

# flatten array
flatten = lambda l: [item for sublist in l for item in sublist]

a = [[0,1], [10,11], [20,21], [30,31], [40,41], [50,51]]
w = sliding_window(a, width=3)
print( list(map(flatten,w)) )

[[0, 1, 10, 11, 20, 21], [10, 11, 20, 21, 30, 31], [20, 21, 30, 31, 40, 41], [30, 31, 40, 41, 50, 51]]

Benchmark

基准

import timeit
def benchmark():
  a = [[0,1], [10,11], [20,21], [30,31], [40,41], [50,51]]
  sliding_window(a, width=3)

times = timeit.Timer(benchmark).repeat(3, number=1000)
time_taken = min(times) / 1000
print(time_taken)

1.0944640007437556e-06

回答by Yahya

Here is One line using Numpy >= v1.17

这是使用 Numpy >= v1.17 的一行

splits = np.vstack(np.split(x,np.array([[i, i+3] for i in range(x.shape[0] - x.shape[1])]).reshape(-1))).reshape(-1, 6) 

Test

测试

x = np.array([[00,1],
              [10,11],
              [20,21],
              [30,31],
              [40,41],
              [50,51]])

Result

结果

[[ 0  1 10 11 20 21]
 [10 11 20 21 30 31]
 [20 21 30 31 40 41]
 [30 31 40 41 50 51]]

Test Performance On Large Array

在大型阵列上测试性能

import numpy as np
import time

x = np.array(range(1000)).reshape(-1, 2)

all_t = 0.
for i in range(1000):
    start_ = time.time()
    np.vstack(
        numpy.split(x,np.array([[i, i+3] for i in range(x.shape[0] - x.shape[1])])
                    .reshape(-1))).reshape(-1, 6)
    all_t += time.time() - start_

print('Average Time of 1000 Iterations on Array of Shape '
      '1000 x 2 is: {} Seconds.'.format(all_t/1000.))

Performance Result

表现结果

Average Time of 1000 Iterations on Array of Shape 1000 x 2 is: 0.0016909 Seconds.