Python Keras：如何保存模型并继续训练？

Question

提问by David

I have a model that I've trained for 40 epochs. I kept checkpoints for each epochs, and I have also saved the model with model.save(). The code for training is:

我有一个已经训练了 40 个 epoch 的模型。我为每个时代保留了检查点，并且我还使用model.save(). 训练代码为：

n_units = 1000
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

However, when I load the model and try training it again, it starts all over as if it hasn't been trained before. The loss doesn't start from the last training.

但是，当我加载模型并尝试再次训练它时，它会重新开始，就好像它之前没有训练过一样。损失不是从上次训练开始的。

What confuses me is when I load the model and redefine the model structure and use load_weight, model.predict()works well. Thus, I believe the model weights are loaded:

令我困惑的是，当我加载模型并重新定义模型结构并使用时load_weight，model.predict()效果很好。因此，我相信模型权重已加载：

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
filename = "word2vec-39-0.0027.hdf5"
model.load_weights(filename)
model.compile(loss='mean_squared_error', optimizer='adam')

However, When I continue training with this, the loss is as high as the initial stage:

但是，当我继续训练时，损失与初始阶段一样高：

filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

I searched and found some examples of saving and loading models hereand here. However, none of them work.

我在这里和这里搜索并找到了一些保存和加载模型的例子。但是，它们都不起作用。

Update 1

更新 1

I looked at this question, tried it and it works:

我看着这个问题，试了一下，它的工作原理：

model.save('partly_trained.h5')
del model
load_model('partly_trained.h5')

But when I close Python and reopen it, then run load_modelagain, it fails. The loss is as high as the initial state.

但是当我关闭 Python 并重新打开它，然后load_model再次运行时，它失败了。损失与初始状态一样高。

Update 2

更新 2

I tried Yu-Yang's example codeand it works. However, when I use my code again, it still failed.

我尝试了Yu-Yang 的示例代码并且它有效。但是，当我再次使用我的代码时，它仍然失败。

This is result form the original training. The second epoch should start with loss = 3.1***:

这是原始训练的结果。第二个 epoch 应该从 loss = 3.1*** 开始：

13700/13846 [============================>.] - ETA: 0s - loss: 3.0519
13750/13846 [============================>.] - ETA: 0s - loss: 3.0511
13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5

13846/13846 [==============================] - 81s - loss: 3.0510    
Epoch 2/60

   50/13846 [..............................] - ETA: 80s - loss: 3.1754
  100/13846 [..............................] - ETA: 78s - loss: 3.1174
  150/13846 [..............................] - ETA: 78s - loss: 3.0745

I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5")then train with:

我关闭 Python，重新打开它，加载模型，model = load_model("LPT-00-3.0510.h5")然后训练：

filepath="LPT-{epoch:02d}-{loss:.4f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)

The loss starts with 4.54:

损失从 4.54 开始：

Epoch 1/60
   50/13846 [..............................] - ETA: 162s - loss: 4.5451
   100/13846 [..............................] - ETA: 113s - loss: 4.3835

Answer 1

回答by Yu-Yang

As it's quite difficult to clarify where the problem is, I created a toy example from your code, and it seems to work alright.

由于很难弄清楚问题出在哪里，因此我根据您的代码创建了一个玩具示例，它似乎可以正常工作。

import numpy as np
from numpy.testing import assert_allclose
from keras.models import Sequential, load_model
from keras.layers import LSTM, Dropout, Dense
from keras.callbacks import ModelCheckpoint

vec_size = 100
n_units = 10

x_train = np.random.rand(500, 10, vec_size)
y_train = np.random.rand(500, vec_size)

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')

# define the checkpoint
filepath = "model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

# fit the model
model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)

# load the model
new_model = load_model(filepath)
assert_allclose(model.predict(x_train),
                new_model.predict(x_train),
                1e-5)

# fit the model
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
new_model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)

The loss continues to decrease after model loading. (restarting python also gives no problem)

模型加载后损失继续减少。（重启python也没问题）

Using TensorFlow backend.
Epoch 1/5
500/500 [==============================] - 2s - loss: 0.3216     Epoch 00000: loss improved from inf to 0.32163, saving model to model.h5
Epoch 2/5
500/500 [==============================] - 0s - loss: 0.2923     Epoch 00001: loss improved from 0.32163 to 0.29234, saving model to model.h5
Epoch 3/5
500/500 [==============================] - 0s - loss: 0.2542     Epoch 00002: loss improved from 0.29234 to 0.25415, saving model to model.h5
Epoch 4/5
500/500 [==============================] - 0s - loss: 0.2086     Epoch 00003: loss improved from 0.25415 to 0.20860, saving model to model.h5
Epoch 5/5
500/500 [==============================] - 0s - loss: 0.1725     Epoch 00004: loss improved from 0.20860 to 0.17249, saving model to model.h5

Epoch 1/5
500/500 [==============================] - 0s - loss: 0.1454     Epoch 00000: loss improved from inf to 0.14543, saving model to model.h5
Epoch 2/5
500/500 [==============================] - 0s - loss: 0.1289     Epoch 00001: loss improved from 0.14543 to 0.12892, saving model to model.h5
Epoch 3/5
500/500 [==============================] - 0s - loss: 0.1169     Epoch 00002: loss improved from 0.12892 to 0.11694, saving model to model.h5
Epoch 4/5
500/500 [==============================] - 0s - loss: 0.1097     Epoch 00003: loss improved from 0.11694 to 0.10971, saving model to model.h5
Epoch 5/5
500/500 [==============================] - 0s - loss: 0.1057     Epoch 00004: loss improved from 0.10971 to 0.10570, saving model to model.h5

BTW, redefining the model followed by load_weight()definitely won't work, because save_weight()and load_weight()does not save/load the optimizer.

顺便说一句，重新定义模型之后load_weight()肯定是行不通的，因为save_weight()并且load_weight()不会保存/加载优化器。

Answer 2

回答by David

I compared my code with this example http://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/by carefully block out line-by-line and run again. After a whole day, finally, I found what was wrong.

我将我的代码与这个例子http://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/仔细地逐行屏蔽并再次运行。折腾了一天，终于发现哪里不对了。

When making char-int mapping, I used

在进行 char-int 映射时，我使用了

# title_str_reduced is a string
chars = list(set(title_str_reduced))
# make char to int index mapping
char2int = {}
for i in range(len(chars)):
    char2int[chars[i]] = i

A set is an unordered data structure. In python, when a set is converted to a list which is ordered, the order is randamly given. Thus my char2int dictionary is randomized everytime when I reopen python. I fixed my code by adding a sorted()

集合是一种无序的数据结构。在python中，当一个集合转换为一个有序的列表时，顺序是随机给出的。因此，每次我重新打开 python 时，我的 char2int 字典都是随机的。我通过添加 sorted() 来修复我的代码

chars = sorted(list(set(title_str_reduced)))

This forces the conversion to a fixed order.

这会强制转换为固定顺序。

Answer 3

回答by Mrinal Jain

The above answer uses tensorflow 1.x. Here is an updated version using Tensorflow 2.x.

上面的答案使用 tensorflow 1.x。这是使用 Tensorflow 2.x 的更新版本。

import numpy as np
from numpy.testing import assert_allclose
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import LSTM, Dropout, Dense
from tensorflow.keras.callbacks import ModelCheckpoint

vec_size = 100
n_units = 10

x_train = np.random.rand(500, 10, vec_size)
y_train = np.random.rand(500, vec_size)

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')

# define the checkpoint
filepath = "model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

# fit the model
model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)

# load the model
new_model = load_model("model.h5")
assert_allclose(model.predict(x_train),
                new_model.predict(x_train),
                1e-5)

# fit the model
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
new_model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)

Answer 4

回答by user30012

The checkmarked Answer is not correct; the real problem is more subtle.

勾选的答案不正确；真正的问题更微妙。

When you create a Model Checkpoint , check the best:

创建模型检查点时，请检查最佳：

cp1 = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') cp1.best

cp1 = ModelCheckpoint(filepath, monitor='loss',verbose=1, save_best_only=True, mode='min') cp1.best

you will see this is set to "np.inf" =. unfortunately thats what they can do.

你会看到这被设置为“np.inf”=。不幸的是，这就是他们能做的。

But when train and recreate the ModelCheckpoint, if you call "fit" and if the loss is less than previously known value, then it seems to work. but in more complex problems, this is not the case, so you will end up saving a bad model and lose the best

但是当训练和重新创建 ModelCheckpoint 时，如果您调用“fit”并且损失小于先前已知的值，那么它似乎可以工作。但在更复杂的问题中，情况并非如此，因此您最终会保存一个糟糕的模型并失去最好的模型

the correct fix with modification shown below:

修改后的正确修复如下所示：

import numpy as np
from numpy.testing import assert_allclose
from keras.models import Sequential, load_model
from keras.layers import LSTM, Dropout, Dense
from keras.callbacks import ModelCheckpoint

vec_size = 100
n_units = 10

x_train = np.random.rand(500, 10, vec_size)
y_train = np.random.rand(500, vec_size)

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')

# define the checkpoint
filepath = "model.h5"
cp1= ModelCheckpoint(filepath=filepath, monitor='loss',     save_best_only=True, verbose=1, mode='min')
callbacks_list = [cp1]

# fit the model
model.fit(x_train, y_train, epochs=5, batch_size=50, shuffle=True, validation_split=0.1, callbacks=callbacks_list)

# load the model
new_model = load_model(filepath)
#assert_allclose(model.predict(x_train),new_model.predict(x_train), 1e-5)
score = model.evaluate(x_train, y_train, batch_size=50)
cp1 = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
cp1.best = score # <== ****THIS IS THE KEY **** See source for  ModelCheckpoint

# fit the model
callbacks_list = [cp1]
new_model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)

Answer 5

回答by Anubhav Apurva

Here is the official kera's Documentation to save a model:

这是用于保存模型的官方 kera 文档：

https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

In this postthe author provides two examples of saving and loading your model to file as:

在这篇文章中，作者提供了两个将模型保存和加载到文件的示例：

JSON format.
YAML foramt.

JSON 格式。
YAML 格式。

Answer 6

回答by bruce

I think you can write

我觉得你可以写

model.save('partly_trained.h5' )

and

和

model = load_model('partly_trained.h5'),

instead of

代替

model = Sequential()model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))model.add(Dropout(0.2))model.add(LSTM(n_units, return_sequences=True))model.add(Dropout(0.2))model.add(LSTM(n_units))model.add(Dropout(0.2))model.add(Dense(vec_size, activation='linear'))model.compile(loss='mean_squared_error', optimizer='adam'),

Then go continuing training. Because model.save store both architecture & weights.

然后继续训练。因为 model.save 存储架构和权重。

Answer 7

回答by MH304

assume you have a code like this:

假设您有这样的代码：

model = some_model_you_made(input_img) # you compiled your model in this 
model.summary()

model_checkpoint = ModelCheckpoint('yours.h5', monitor='val_loss', verbose=1, save_best_only=True)

model_json = model.to_json()
with open("yours.json", "w") as json_file:
    json_file.write(model_json)

model.fit_generator(#stuff...) # or model.fit(#stuff...)

Now turn your code into this:

现在把你的代码变成这样：

model = some_model_you_made(input_img) #same model here
model.summary()

model_checkpoint = ModelCheckpoint('yours.h5', monitor='val_loss', verbose=1, save_best_only=True) #same ckeckpoint

model_json = model.to_json()
with open("yours.json", "w") as json_file:
    json_file.write(model_json)

with open('yours.json', 'r') as f:
    old_model = model_from_json(f.read()) # open the model you just saved (same as your last train) with a different name

old_model.load_weights('yours.h5') # the model checkpoint you trained before
old_model.compile(#stuff...) # need to compile again (exactly like the last compile)

# now start training with the checkpoint...
old_model.fit_generator(#same stuff like the last train) # or model.fit(#stuff...)

Python Keras：如何保存模型并继续训练？

提问by David

回答by Yu-Yang

回答by David

回答by Mrinal Jain

回答by user30012

回答by Anubhav Apurva

回答by bruce

回答by MH304

相关推荐

最近更新

标签

Python Keras：如何保存模型并继续训练？

提问by David

回答by Yu-Yang

回答by David

回答by Mrinal Jain

回答by user30012

回答by Anubhav Apurva

回答by bruce

回答by MH304

相关推荐

python如何用零填充numpy数组

Python 将 Numpy 数组按列转换为 Pandas DataFrame（作为单行）

Python 迭代 Pandas DataFrame 中的每个元素

Python Sklearn：用于多类分类的 ROC

相关推荐

最近更新

标签