Python 在预测过程中，数据规范化在 keras 中是如何工作的？

Question

提问by Alex Taylor

I see that the imageDataGenerator allows me to specify different styles of data normalization, e.g. featurewise_center, samplewise_center, etc.

我看到 imageDataGenerator 允许我指定不同风格的数据规范化，例如 featurewise_center、samplewise_center 等。

I see from the examples that if I specify one of these options, then I need to call the fit method on the generator in order to allow the generator to compute statistics like the mean image on the generator.

我从示例中看到，如果我指定这些选项之一，那么我需要调用生成器上的 fit 方法，以便允许生成器计算统计数据，例如生成器上的平均图像。

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True)

# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(X_train)

# fits the model on batches with real-time data augmentation:
model.fit_generator(datagen.flow(X_train, Y_train, batch_size=32),
                samples_per_epoch=len(X_train), nb_epoch=nb_epoch)

My question is, how does prediction work if I have specified data normalization during training? I can't see how in the framework I would even pass knowledge of the training set mean/std deviation along to predict to allow me to normalize my test data myself, but I also don't see in the training code where this information is stored.

我的问题是，如果我在训练期间指定了数据标准化，预测如何工作？我看不出在框架中我什至会传递训练集均值/标准偏差的知识来预测以允许我自己规范化我的测试数据，但我也没有在训练代码中看到这些信息的位置存储。

Are the image statistics needed for normalization stored in the model so that they can be used during prediction?

归一化所需的图像统计信息是否存储在模型中，以便在预测过程中使用它们？

Answer 1

回答by Marcin Mo?ejko

Yes - this is a really huge downside of Keras.ImageDataGeneratorthat you couldn't provide the standarization statistics on your own. But - there is an easy method on how to overcome this issue.

是的 - 这是一个非常大的缺点，Keras.ImageDataGenerator因为您无法自己提供标准化统计数据。但是 - 有一个简单的方法可以解决这个问题。

Assuming that you have a function normalize(x)which is normalizing an image batch(remember that generator is not providing a simple image but an array of images - a batchwith shape (nr_of_examples_in_batch, image_dims ..)you could make your own generator with normalization by using:

假设您有一个normalize(x)对图像批次进行标准化的函数（请记住，生成器不提供简单的图像，而是提供一组图像 -具有形状的批次，(nr_of_examples_in_batch, image_dims ..)您可以使用以下方法制作自己的标准化生成器：

def gen_with_norm(gen, normalize):
    for x, y in gen:
        yield normalize(x), y

Then you might simply use gen_with_norm(datagen.flow, normalize)instead of datagen.flow.

那么你可以简单地使用gen_with_norm(datagen.flow, normalize)而不是datagen.flow.

Moreover - you might recover the meanand stdcomputed by a fitmethod by getting it from appropriate fields in datagen (e.g. datagen.meanand datagen.std).

此外 - 您可以通过从 datagen 中的适当字段（例如和）获取它来恢复mean和std计算的fit方法。datagen.meandatagen.std

Answer 2

回答by Martin Thoma

Use the standardizemethod of the generator for each element. Here is a complete example for CIFAR 10:

standardize对每个元素使用生成器的方法。这是 CIFAR 10 的完整示例：

#!/usr/bin/env python

import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D

# input image dimensions
img_rows, img_cols, img_channels = 32, 32, 3
num_classes = 10

batch_size = 32
epochs = 1

# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()

model.add(Conv2D(32, (3, 3), padding='same', activation='relu',
                 input_shape=x_train.shape[1:]))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
              metrics=['accuracy'])

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

datagen = ImageDataGenerator(zca_whitening=True)

# Compute principal components required for ZCA
datagen.fit(x_train)

# Apply normalization (ZCA and others)
print(x_test.shape)
for i in range(len(x_test)):
    # this is what you are looking for
    x_test[i] = datagen.standardize(x_test[i])
print(x_test.shape)

# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(x_train, y_train,
                                 batch_size=batch_size),
                    steps_per_epoch=x_train.shape[0] // batch_size,
                    epochs=epochs,
                    validation_data=(x_test, y_test))

Answer 3

回答by Hari

I am using the datagen.fitfunction itself.

我正在使用该datagen.fit功能本身。

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True)
train_datagen.fit(train_data)

test_datagen = ImageDataGenerator(  
    featurewise_center=True, 
    featurewise_std_normalization=True)
test_datagen.fit(train_data)

Ideally with this, test_datagenfitted on training dataset will learn the training datasets statistics. Then it will use these statistics to normalize testing data.

理想情况下， test_datagen安装在训练数据集上将学习训练数据集的统计信息。然后它将使用这些统计数据来规范化测试数据。

Answer 4

回答by Alexander Pacha

I also had the same issue and I solved it using the same functionality, that the ImageDataGeneratorused:

我也遇到了同样的问题，我使用相同的功能解决了它，使用的功能是ImageDataGenerator：

# Load Cifar-10 dataset
(trainX, trainY), (testX, testY) = cifar10.load_data()
generator = ImageDataGenerator(featurewise_center=True, 
                               featurewise_std_normalization=True)

# Calculate statistics on train dataset
generator.fit(trainX)
# Apply featurewise_center to test-data with statistics from train data
testX -= generator.mean
# Apply featurewise_std_normalization to test-data with statistics from train data
testX /= (generator.std + K.epsilon())

# Do your regular fitting
model.fit_generator(..., validation_data=(testX, testY), ...)

Note that this is only possible if you have a reasonable small dataset, like CIFAR-10. Otherwise the solution proposed by Marcinsounds good more reasonable.

请注意，这只有在您有一个合理的小数据集（如 CIFAR-10）时才有可能。否则，Marcin 提出的解决方案听起来更合理。

Python 在预测过程中，数据规范化在 keras 中是如何工作的？

提问by Alex Taylor

回答by Marcin Mo?ejko

回答by Martin Thoma

回答by Hari

回答by Alexander Pacha

相关推荐

最近更新

标签

Python 在预测过程中，数据规范化在 keras 中是如何工作的？

提问by Alex Taylor

回答by Marcin Mo?ejko

回答by Martin Thoma

回答by Hari

回答by Alexander Pacha

相关推荐

Python labelEncoder 在 sklearn 中的工作

Python 通过排除使用 isin 过滤 pyspark 数据框

python pandas：如何计算导数/梯度

Python 如何在 Pandas 数据框中将时间戳转换为 datetime.date？

相关推荐

最近更新

标签