Python 如何为自己的数据实现 tensorflow 的 next_batch

Question

提问by blckbird

In the tensorflow MNIST tutorialthe mnist.train.next_batch(100)function comes very handy. I am now trying to implement a simple classification myself. I have my training data in a numpy array. How could I implement a similar function for my own data to give me the next batch?

在tensorflow MNIST 教程中，该mnist.train.next_batch(100)功能非常方便。我现在正在尝试自己实现一个简单的分类。我的训练数据放在一个 numpy 数组中。我怎样才能为我自己的数据实现类似的功能来给我下一批？

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
Xtr, Ytr = loadData()
for it in range(1000):
    batch_x = Xtr.next_batch(100)
    batch_y = Ytr.next_batch(100)

Answer 1

回答by edo

The link you posted says: "we get a "batch" of one hundred random data points from our training set". In my example I use a global function (not a method like in your example) so there will be a difference in syntax.

您发布的链接说：“我们从我们的训练集中获得了 100 个随机数据点的“批次”。在我的示例中，我使用了一个全局函数（不是您示例中的方法），因此语法会有所不同。

In my function you'll need to pass the number of samples wanted and the data array.

在我的函数中，您需要传递所需的样本数和数据数组。

Here is the correct code, which ensures samples have correct labels:

这是正确的代码，可确保样本具有正确的标签：

import numpy as np

def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
print(Xtr)
print(Ytr)

Xtr, Ytr = next_batch(5, Xtr, Ytr)
print('\n5 random samples')
print(Xtr)
print(Ytr)

And a demo run:

和一个演示运行：

[0 1 2 3 4 5 6 7 8 9]
[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]

5 random samples
[9 1 5 6 7]
[[90 91 92 93 94 95 96 97 98 99]
 [10 11 12 13 14 15 16 17 18 19]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]]

Answer 2

回答by Brother_Mumu

In order to shuffle and sampling each mini-batch, the state whether a sample has been selected inside the current epoch should also be considered. Here is an implementation which use the data in the above answer.

为了对每个 mini-batch 进行 shuffle 和采样，还应该考虑在当前 epoch 内是否已经选择了一个样本的状态。这是一个使用上述答案中的数据的实现。

import numpy as np 

class Dataset:

def __init__(self,data):
    self._index_in_epoch = 0
    self._epochs_completed = 0
    self._data = data
    self._num_examples = data.shape[0]
    pass


@property
def data(self):
    return self._data

def next_batch(self,batch_size,shuffle = True):
    start = self._index_in_epoch
    if start == 0 and self._epochs_completed == 0:
        idx = np.arange(0, self._num_examples)  # get all possible indexes
        np.random.shuffle(idx)  # shuffle indexe
        self._data = self.data[idx]  # get list of `num` random samples

    # go to the next batch
    if start + batch_size > self._num_examples:
        self._epochs_completed += 1
        rest_num_examples = self._num_examples - start
        data_rest_part = self.data[start:self._num_examples]
        idx0 = np.arange(0, self._num_examples)  # get all possible indexes
        np.random.shuffle(idx0)  # shuffle indexes
        self._data = self.data[idx0]  # get list of `num` random samples

        start = 0
        self._index_in_epoch = batch_size - rest_num_examples #avoid the case where the #sample != integar times of batch_size
        end =  self._index_in_epoch  
        data_new_part =  self._data[start:end]  
        return np.concatenate((data_rest_part, data_new_part), axis=0)
    else:
        self._index_in_epoch += batch_size
        end = self._index_in_epoch
        return self._data[start:end]

dataset = Dataset(np.arange(0, 10))
for i in range(10):
    print(dataset.next_batch(5))

the output is:

输出是：

[2 8 6 3 4]
[1 5 9 0 7]
[1 7 3 0 8]
[2 6 5 9 4]
[1 0 4 8 3]
[7 6 2 9 5]
[9 5 4 6 2]
[0 1 8 7 3]
[9 7 8 1 6]
[3 5 2 4 0]

the first and second (3rd and 4th,...) mini-batch correspond to one whole epoch..

第一个和第二个（第 3 个和第 4 个，...）小批量对应于一个完整的 epoch..

Answer 3

回答by itsergiu

I use Anaconda and Jupyter. In Jupyter if you run ?mnistyou get: File: c:\programdata\anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py Docstring: Datasets(train, validation, test)

我使用 Anaconda 和 Jupyter。在 Jupyter 中，如果你运行，?mnist你会得到： File: c:\programdata\anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py Docstring: Datasets(train, validation, test)

In folder datesetsyou shall find mnist.pywhich contains all methods including next_batch.

在文件夹中，datesets您会发现mnist.py其中包含所有方法，包括next_batch.

Answer 4

回答by Sohaib Anwaar

The answer which is marked up above I tried the algorithm by that algorithm I am not getting results so I searched on kaggle and I saw really amazing algorithm which worked really well. Best result try this. In below algorithm **Global variabletakes the input you declared above in which you read your data set.**

上面标记的答案我尝试了该算法的算法我没有得到结果所以我在 kaggle 上搜索，我看到了非常棒的算法，它工作得非常好。最好的结果试试这个。在下面的算法中 **全局变量采用您在上面声明的输入，您可以在其中读取数据集。**

epochs_completed = 0
index_in_epoch = 0
num_examples = X_train.shape[0]
    # for splitting out batches of data
def next_batch(batch_size):

    global X_train
    global y_train
    global index_in_epoch
    global epochs_completed

    start = index_in_epoch
    index_in_epoch += batch_size

    # when all trainig data have been already used, it is reorder randomly    
    if index_in_epoch > num_examples:
        # finished epoch
        epochs_completed += 1
        # shuffle the data
        perm = np.arange(num_examples)
        np.random.shuffle(perm)
        X_train = X_train[perm]
        y_train = y_train[perm]
        # start next epoch
        start = 0
        index_in_epoch = batch_size
        assert batch_size <= num_examples
    end = index_in_epoch
    return X_train[start:end], y_train[start:end]

Answer 5

回答by Mike Gashler

Yet another implementation:

另一个实现：

from typing import Tuple
import numpy as np

class BatchMaker(object):
    def __init__(self, feat: np.array, lab: np.array) -> None:
        if len(feat) != len(lab):
            raise ValueError("Expected feat and lab to have the same number of samples")
        self.feat = feat
        self.lab = lab
        self.indexes = np.arange(len(feat))
        np.random.shuffle(self.indexes)
        self.pos = 0

    # "BatchMaker, BatchMaker, make me a batch..."
    def next_batch(self, batch_size: int) -> Tuple[np.array, np.array]:
        if self.pos + batch_size > len(self.feat):
            np.random.shuffle(self.indexes)
            self.pos = 0
        batch_indexes = self.indexes[self.pos: self.pos + batch_size]
        self.pos += batch_size
        return self.feat[batch_indexes], self.lab[batch_indexes]

Answer 6

回答by Aakash Saxena

If you would not like to get shape mismatch error in your tensorflow session run then use the below function instead of the function provided in the first solution above (https://stackoverflow.com/a/40995666/7748451) -

如果您不想在 tensorflow 会话运行中出现形状不匹配错误，请使用以下函数而不是上面第一个解决方案中提供的函数 ( https://stackoverflow.com/a/40995666/7748451) -

def next_batch(num, data, labels):

    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = data[idx]
    labels_shuffle = labels[idx]
    labels_shuffle = np.asarray(labels_shuffle.values.reshape(len(labels_shuffle), 1))

    return data_shuffle, labels_shuffle

Python 如何为自己的数据实现 tensorflow 的 next_batch

提问by blckbird

回答by edo

回答by Brother_Mumu

回答by itsergiu

回答by Sohaib Anwaar

回答by Mike Gashler

回答by Aakash Saxena

相关推荐

最近更新

标签

Python 如何为自己的数据实现 tensorflow 的 next_batch

提问by blckbird

回答by edo

回答by Brother_Mumu

回答by itsergiu

回答by Sohaib Anwaar

回答by Mike Gashler

回答by Aakash Saxena

相关推荐

Python 在csv文件中打印行？

Python matplotlib 图例的标题

Python 剥离/修剪数据帧的所有字符串

Python 3.6 没有名为 pip 的模块

相关推荐

最近更新

标签