Python 如何为自己的数据实现 tensorflow 的 next_batch
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40994583/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to implement tensorflow's next_batch for own data
提问by blckbird
In the tensorflow MNIST tutorialthe mnist.train.next_batch(100)
function comes very handy. I am now trying to implement a simple classification myself. I have my training data in a numpy array. How could I implement a similar function for my own data to give me the next batch?
在tensorflow MNIST 教程中,该mnist.train.next_batch(100)
功能非常方便。我现在正在尝试自己实现一个简单的分类。我的训练数据放在一个 numpy 数组中。我怎样才能为我自己的数据实现类似的功能来给我下一批?
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
Xtr, Ytr = loadData()
for it in range(1000):
batch_x = Xtr.next_batch(100)
batch_y = Ytr.next_batch(100)
回答by edo
The link you posted says: "we get a "batch" of one hundred random data points from our training set". In my example I use a global function (not a method like in your example) so there will be a difference in syntax.
您发布的链接说:“我们从我们的训练集中获得了 100 个随机数据点的“批次”。在我的示例中,我使用了一个全局函数(不是您示例中的方法),因此语法会有所不同。
In my function you'll need to pass the number of samples wanted and the data array.
在我的函数中,您需要传递所需的样本数和数据数组。
Here is the correct code, which ensures samples have correct labels:
这是正确的代码,可确保样本具有正确的标签:
import numpy as np
def next_batch(num, data, labels):
'''
Return a total of `num` random samples and labels.
'''
idx = np.arange(0 , len(data))
np.random.shuffle(idx)
idx = idx[:num]
data_shuffle = [data[ i] for i in idx]
labels_shuffle = [labels[ i] for i in idx]
return np.asarray(data_shuffle), np.asarray(labels_shuffle)
Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10)
print(Xtr)
print(Ytr)
Xtr, Ytr = next_batch(5, Xtr, Ytr)
print('\n5 random samples')
print(Xtr)
print(Ytr)
And a demo run:
和一个演示运行:
[0 1 2 3 4 5 6 7 8 9]
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98 99]]
5 random samples
[9 1 5 6 7]
[[90 91 92 93 94 95 96 97 98 99]
[10 11 12 13 14 15 16 17 18 19]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]]
回答by Brother_Mumu
In order to shuffle and sampling each mini-batch, the state whether a sample has been selected inside the current epoch should also be considered. Here is an implementation which use the data in the above answer.
为了对每个 mini-batch 进行 shuffle 和采样,还应该考虑在当前 epoch 内是否已经选择了一个样本的状态。这是一个使用上述答案中的数据的实现。
import numpy as np
class Dataset:
def __init__(self,data):
self._index_in_epoch = 0
self._epochs_completed = 0
self._data = data
self._num_examples = data.shape[0]
pass
@property
def data(self):
return self._data
def next_batch(self,batch_size,shuffle = True):
start = self._index_in_epoch
if start == 0 and self._epochs_completed == 0:
idx = np.arange(0, self._num_examples) # get all possible indexes
np.random.shuffle(idx) # shuffle indexe
self._data = self.data[idx] # get list of `num` random samples
# go to the next batch
if start + batch_size > self._num_examples:
self._epochs_completed += 1
rest_num_examples = self._num_examples - start
data_rest_part = self.data[start:self._num_examples]
idx0 = np.arange(0, self._num_examples) # get all possible indexes
np.random.shuffle(idx0) # shuffle indexes
self._data = self.data[idx0] # get list of `num` random samples
start = 0
self._index_in_epoch = batch_size - rest_num_examples #avoid the case where the #sample != integar times of batch_size
end = self._index_in_epoch
data_new_part = self._data[start:end]
return np.concatenate((data_rest_part, data_new_part), axis=0)
else:
self._index_in_epoch += batch_size
end = self._index_in_epoch
return self._data[start:end]
dataset = Dataset(np.arange(0, 10))
for i in range(10):
print(dataset.next_batch(5))
the output is:
输出是:
[2 8 6 3 4]
[1 5 9 0 7]
[1 7 3 0 8]
[2 6 5 9 4]
[1 0 4 8 3]
[7 6 2 9 5]
[9 5 4 6 2]
[0 1 8 7 3]
[9 7 8 1 6]
[3 5 2 4 0]
the first and second (3rd and 4th,...) mini-batch correspond to one whole epoch..
第一个和第二个(第 3 个和第 4 个,...)小批量对应于一个完整的 epoch..
回答by itsergiu
I use Anaconda and Jupyter.
In Jupyter if you run ?mnist
you get:
File: c:\programdata\anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py
Docstring: Datasets(train, validation, test)
我使用 Anaconda 和 Jupyter。在 Jupyter 中,如果你运行,?mnist
你会得到:
File: c:\programdata\anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py
Docstring: Datasets(train, validation, test)
In folder datesets
you shall find mnist.py
which contains all methods including next_batch
.
在文件夹中,datesets
您会发现mnist.py
其中包含所有方法,包括next_batch
.
回答by Sohaib Anwaar
The answer which is marked up above I tried the algorithm by that algorithm I am not getting results so I searched on kaggle and I saw really amazing algorithm which worked really well. Best result try this. In below algorithm **Global variabletakes the input you declared above in which you read your data set.**
上面标记的答案我尝试了该算法的算法我没有得到结果所以我在 kaggle 上搜索,我看到了非常棒的算法,它工作得非常好。最好的结果试试这个。在下面的算法中 **全局变量采用您在上面声明的输入,您可以在其中读取数据集。**
epochs_completed = 0
index_in_epoch = 0
num_examples = X_train.shape[0]
# for splitting out batches of data
def next_batch(batch_size):
global X_train
global y_train
global index_in_epoch
global epochs_completed
start = index_in_epoch
index_in_epoch += batch_size
# when all trainig data have been already used, it is reorder randomly
if index_in_epoch > num_examples:
# finished epoch
epochs_completed += 1
# shuffle the data
perm = np.arange(num_examples)
np.random.shuffle(perm)
X_train = X_train[perm]
y_train = y_train[perm]
# start next epoch
start = 0
index_in_epoch = batch_size
assert batch_size <= num_examples
end = index_in_epoch
return X_train[start:end], y_train[start:end]
回答by Mike Gashler
Yet another implementation:
另一个实现:
from typing import Tuple
import numpy as np
class BatchMaker(object):
def __init__(self, feat: np.array, lab: np.array) -> None:
if len(feat) != len(lab):
raise ValueError("Expected feat and lab to have the same number of samples")
self.feat = feat
self.lab = lab
self.indexes = np.arange(len(feat))
np.random.shuffle(self.indexes)
self.pos = 0
# "BatchMaker, BatchMaker, make me a batch..."
def next_batch(self, batch_size: int) -> Tuple[np.array, np.array]:
if self.pos + batch_size > len(self.feat):
np.random.shuffle(self.indexes)
self.pos = 0
batch_indexes = self.indexes[self.pos: self.pos + batch_size]
self.pos += batch_size
return self.feat[batch_indexes], self.lab[batch_indexes]
回答by Aakash Saxena
If you would not like to get shape mismatch error in your tensorflow session run then use the below function instead of the function provided in the first solution above (https://stackoverflow.com/a/40995666/7748451) -
如果您不想在 tensorflow 会话运行中出现形状不匹配错误,请使用以下函数而不是上面第一个解决方案中提供的函数 ( https://stackoverflow.com/a/40995666/7748451) -
def next_batch(num, data, labels):
'''
Return a total of `num` random samples and labels.
'''
idx = np.arange(0 , len(data))
np.random.shuffle(idx)
idx = idx[:num]
data_shuffle = data[idx]
labels_shuffle = labels[idx]
labels_shuffle = np.asarray(labels_shuffle.values.reshape(len(labels_shuffle), 1))
return data_shuffle, labels_shuffle