Python 在预测过程中,数据规范化在 keras 中是如何工作的?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41855512/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does data normalization work in keras during prediction?
提问by Alex Taylor
I see that the imageDataGenerator allows me to specify different styles of data normalization, e.g. featurewise_center, samplewise_center, etc.
我看到 imageDataGenerator 允许我指定不同风格的数据规范化,例如 featurewise_center、samplewise_center 等。
I see from the examples that if I specify one of these options, then I need to call the fit method on the generator in order to allow the generator to compute statistics like the mean image on the generator.
我从示例中看到,如果我指定这些选项之一,那么我需要调用生成器上的 fit 方法,以便允许生成器计算统计数据,例如生成器上的平均图像。
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(X_train)
# fits the model on batches with real-time data augmentation:
model.fit_generator(datagen.flow(X_train, Y_train, batch_size=32),
samples_per_epoch=len(X_train), nb_epoch=nb_epoch)
My question is, how does prediction work if I have specified data normalization during training? I can't see how in the framework I would even pass knowledge of the training set mean/std deviation along to predict to allow me to normalize my test data myself, but I also don't see in the training code where this information is stored.
我的问题是,如果我在训练期间指定了数据标准化,预测如何工作?我看不出在框架中我什至会传递训练集均值/标准偏差的知识来预测以允许我自己规范化我的测试数据,但我也没有在训练代码中看到这些信息的位置存储。
Are the image statistics needed for normalization stored in the model so that they can be used during prediction?
归一化所需的图像统计信息是否存储在模型中,以便在预测过程中使用它们?
回答by Marcin Mo?ejko
Yes - this is a really huge downside of Keras.ImageDataGenerator
that you couldn't provide the standarization statistics on your own. But - there is an easy method on how to overcome this issue.
是的 - 这是一个非常大的缺点,Keras.ImageDataGenerator
因为您无法自己提供标准化统计数据。但是 - 有一个简单的方法可以解决这个问题。
Assuming that you have a function normalize(x)
which is normalizing an image batch(remember that generator is not providing a simple image but an array of images - a batchwith shape (nr_of_examples_in_batch, image_dims ..)
you could make your own generator with normalization by using:
假设您有一个normalize(x)
对图像批次进行标准化的函数(请记住,生成器不提供简单的图像,而是提供一组图像 -具有形状的批次,(nr_of_examples_in_batch, image_dims ..)
您可以使用以下方法制作自己的标准化生成器:
def gen_with_norm(gen, normalize):
for x, y in gen:
yield normalize(x), y
Then you might simply use gen_with_norm(datagen.flow, normalize)
instead of datagen.flow
.
那么你可以简单地使用gen_with_norm(datagen.flow, normalize)
而不是datagen.flow
.
Moreover - you might recover the mean
and std
computed by a fit
method by getting it from appropriate fields in datagen (e.g. datagen.mean
and datagen.std
).
此外 - 您可以通过从 datagen 中的适当字段(例如和)获取它来恢复mean
和std
计算的fit
方法。datagen.mean
datagen.std
回答by Martin Thoma
Use the standardize
method of the generator for each element. Here is a complete example for CIFAR 10:
standardize
对每个元素使用生成器的方法。这是 CIFAR 10 的完整示例:
#!/usr/bin/env python
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
# input image dimensions
img_rows, img_cols, img_channels = 32, 32, 3
num_classes = 10
batch_size = 32
epochs = 1
# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', activation='relu',
input_shape=x_train.shape[1:]))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
datagen = ImageDataGenerator(zca_whitening=True)
# Compute principal components required for ZCA
datagen.fit(x_train)
# Apply normalization (ZCA and others)
print(x_test.shape)
for i in range(len(x_test)):
# this is what you are looking for
x_test[i] = datagen.standardize(x_test[i])
print(x_test.shape)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(x_train, y_train,
batch_size=batch_size),
steps_per_epoch=x_train.shape[0] // batch_size,
epochs=epochs,
validation_data=(x_test, y_test))
回答by Hari
I am using the datagen.fit
function itself.
我正在使用该datagen.fit
功能本身。
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True)
train_datagen.fit(train_data)
test_datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True)
test_datagen.fit(train_data)
Ideally with this, test_datagen
fitted on training dataset will learn the training datasets statistics. Then it will use these statistics to normalize testing data.
理想情况下, test_datagen
安装在训练数据集上将学习训练数据集的统计信息。然后它将使用这些统计数据来规范化测试数据。
回答by Alexander Pacha
I also had the same issue and I solved it using the same functionality, that the ImageDataGenerator
used:
我也遇到了同样的问题,我使用相同的功能解决了它,使用的功能是ImageDataGenerator
:
# Load Cifar-10 dataset
(trainX, trainY), (testX, testY) = cifar10.load_data()
generator = ImageDataGenerator(featurewise_center=True,
featurewise_std_normalization=True)
# Calculate statistics on train dataset
generator.fit(trainX)
# Apply featurewise_center to test-data with statistics from train data
testX -= generator.mean
# Apply featurewise_std_normalization to test-data with statistics from train data
testX /= (generator.std + K.epsilon())
# Do your regular fitting
model.fit_generator(..., validation_data=(testX, testY), ...)
Note that this is only possible if you have a reasonable small dataset, like CIFAR-10. Otherwise the solution proposed by Marcinsounds good more reasonable.
请注意,这只有在您有一个合理的小数据集(如 CIFAR-10)时才有可能。否则,Marcin 提出的解决方案听起来更合理。