pandas 如何将 numpy 数组分成更小的块/批次,然后遍历它们

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39622639/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:03:50  来源:igfitidea点击:

How to break numpy array into smaller chunks/batches, then iterate through them

pythonpandasnumpy

提问by Leb_Broth

Suppose i have this numpy array

假设我有这个 numpy 数组

[[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]]

And i want to split it in 2 batches and then iterate:

我想将它分成 2 批,然后迭代:

[[1, 2, 3],      Batch 1
[4, 5, 6]]

[[7, 8, 9],      Batch 2
[10, 11, 12]]

What is the simplest way to do it?

最简单的方法是什么?

EDIT: I'm deeply sorry i missed putting such info: Once i intend to carry on with the iteration, the original array would be destroyed due to splitting and iterating over batches. Once the batch iteration finished, i need to restart again from the first batch hence I should preserve that the original array wouldn't be destroyed. The whole idea is to be consistent with Stochastic Gradient Descent algorithms which require iterations over batches. In a typical example, I could have a 100000 iteration For loop for just 1000 batch that should be replayed again and again.

编辑:我很抱歉我错过了这样的信息:一旦我打算继续迭代,原始数组将由于批量拆分和迭代而被破坏。批处理迭代完成后,我需要从第一批重新开始,因此我应该保留原始数组不会被破坏。整个想法是与需要在批次上迭代的随机梯度下降算法保持一致。在一个典型的例子中,我可以有一个 100000 次迭代 For 循环,只有 1000 个批次,应该一次又一次地重播。

回答by piRSquared

consider array a

考虑数组 a

a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

Option 1
use reshapeand //

选项 1
使用reshape//

a.reshape(a.shape[0] // 2, -1, a.shape[1])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Option 2
if you wanted groups of two rather than two groups

选项 2
如果您想要两个小组而不是两个小组

a.reshape(-1, 2, a.shape[1])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Option 3
Use a generator

选项 3
使用发电机

def get_every_n(a, n=2):
    for i in range(a.shape[0] // n):
        yield a[n*i:n*(i+1)]

for sa in get_every_n(a, n=2):
    print sa

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]

回答by Divakar

You can use numpy.splitto split along the first axis ntimes, where nis the number of desired batches. Thus, the implementation would look like this -

您可以使用numpy.split沿第一个轴拆分n时间,其中n是所需批次的数量。因此,实现看起来像这样 -

np.split(arr,n,axis=0) # n is number of batches

Since, the default value for axisis 0itself, so we can skip setting it. So, we would simply have -

因为,的默认值axis0它自己,所以我们可以跳过设置它。所以,我们只需——

np.split(arr,n)

Sample runs -

样品运行 -

In [132]: arr  # Input array of shape (10,3)
Out[132]: 
array([[170,  52, 204],
       [114, 235, 191],
       [ 63, 145, 171],
       [ 16,  97, 173],
       [197,  36, 246],
       [218,  75,  68],
       [223, 198,  84],
       [206, 211, 151],
       [187, 132,  18],
       [121, 212, 140]])

In [133]: np.split(arr,2) # Split into 2 batches
Out[133]: 
[array([[170,  52, 204],
        [114, 235, 191],
        [ 63, 145, 171],
        [ 16,  97, 173],
        [197,  36, 246]]), array([[218,  75,  68],
        [223, 198,  84],
        [206, 211, 151],
        [187, 132,  18],
        [121, 212, 140]])]

In [134]: np.split(arr,5) # Split into 5 batches
Out[134]: 
[array([[170,  52, 204],
        [114, 235, 191]]), array([[ 63, 145, 171],
        [ 16,  97, 173]]), array([[197,  36, 246],
        [218,  75,  68]]), array([[223, 198,  84],
        [206, 211, 151]]), array([[187, 132,  18],
        [121, 212, 140]])]

回答by proton

do like this:

这样做:

a = [[1, 2, 3],[4, 5, 6],
     [7, 8, 9],[10, 11, 12]]
b = a[0:2]
c = a[2:4]

回答by abasar

This is what I have used to iterate through. I use b.next()method to generate the indices, then pass the output to slice a numpy array, for example a[b.next()]where a is a numpy array.

这是我用来迭代的。我使用b.next()方法生成索引,然后将输出传递给切片一个 numpy 数组,例如a[b.next()]其中 a 是一个 numpy 数组。

class Batch():    
    def __init__(self, total, batch_size):
        self.total = total
        self.batch_size = batch_size
        self.current = 0

    def next(self):
        max_index = self.current + self.batch_size
        indices = [i if i < self.total else i - self.total 
                       for i in range(self.current, max_index)]
        self.current = max_index % self.total
        return indices 

b = Batch(10, 3)
print(b.next()) # [0, 1, 2]
print(b.next()) # [3, 4, 5]
print(b.next()) # [6, 7, 8]
print(b.next()) # [9, 0, 1]
print(b.next()) # [2, 3, 4]
print(b.next()) # [5, 6, 7]

回答by guorui

To avoid the error "array split does not result in an equal division",

为了避免错误“数组拆分不会导致等分”,

np.array_split(arr, n, axis=0)

is better than np.split(arr, n, axis=0).

比 好np.split(arr, n, axis=0)

For example,

例如,

a = np.array([[170,  52, 204],
              [114, 235, 191],
              [ 63, 145, 171],
              [ 16,  97, 173]])

then

然后

print(np.array_split(a, 2))

[array([[170,  52, 204],
       [114, 235, 191]]), array([[ 63, 145, 171],
       [ 16,  97, 173]])]

print(np.array_split(a, 3))

[array([[170,  52, 204],
       [114, 235, 191]]), array([[ 63, 145, 171]]), array([[ 16,  97, 173]])]

However, print(np.array_split(a, 3))will raise an error since 4/3is not an integer.

但是,print(np.array_split(a, 3))会引发错误,因为4/3它不是整数。