Python Keras:如何保存模型并继续训练?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45393429/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Keras: How to save model and continue training?
提问by David
I have a model that I've trained for 40 epochs. I kept checkpoints for each epochs, and I have also saved the model with model.save()
. The code for training is:
我有一个已经训练了 40 个 epoch 的模型。我为每个时代保留了检查点,并且我还使用model.save()
. 训练代码为:
n_units = 1000
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)
However, when I load the model and try training it again, it starts all over as if it hasn't been trained before. The loss doesn't start from the last training.
但是,当我加载模型并尝试再次训练它时,它会重新开始,就好像它之前没有训练过一样。损失不是从上次训练开始的。
What confuses me is when I load the model and redefine the model structure and use load_weight
, model.predict()
works well. Thus, I believe the model weights are loaded:
令我困惑的是,当我加载模型并重新定义模型结构并使用时load_weight
,model.predict()
效果很好。因此,我相信模型权重已加载:
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
filename = "word2vec-39-0.0027.hdf5"
model.load_weights(filename)
model.compile(loss='mean_squared_error', optimizer='adam')
However, When I continue training with this, the loss is as high as the initial stage:
但是,当我继续训练时,损失与初始阶段一样高:
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)
I searched and found some examples of saving and loading models hereand here. However, none of them work.
我在这里和这里搜索并找到了一些保存和加载模型的例子。但是,它们都不起作用。
Update 1
更新 1
I looked at this question, tried it and it works:
我看着这个问题,试了一下,它的工作原理:
model.save('partly_trained.h5')
del model
load_model('partly_trained.h5')
But when I close Python and reopen it, then run load_model
again, it fails. The loss is as high as the initial state.
但是当我关闭 Python 并重新打开它,然后load_model
再次运行时,它失败了。损失与初始状态一样高。
Update 2
更新 2
I tried Yu-Yang's example codeand it works. However, when I use my code again, it still failed.
我尝试了Yu-Yang 的示例代码并且它有效。但是,当我再次使用我的代码时,它仍然失败。
This is result form the original training. The second epoch should start with loss = 3.1***:
这是原始训练的结果。第二个 epoch 应该从 loss = 3.1*** 开始:
13700/13846 [============================>.] - ETA: 0s - loss: 3.0519
13750/13846 [============================>.] - ETA: 0s - loss: 3.0511
13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5
13846/13846 [==============================] - 81s - loss: 3.0510
Epoch 2/60
50/13846 [..............................] - ETA: 80s - loss: 3.1754
100/13846 [..............................] - ETA: 78s - loss: 3.1174
150/13846 [..............................] - ETA: 78s - loss: 3.0745
I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5")
then train with:
我关闭 Python,重新打开它,加载模型,model = load_model("LPT-00-3.0510.h5")
然后训练:
filepath="LPT-{epoch:02d}-{loss:.4f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)
The loss starts with 4.54:
损失从 4.54 开始:
Epoch 1/60
50/13846 [..............................] - ETA: 162s - loss: 4.5451
100/13846 [..............................] - ETA: 113s - loss: 4.3835
回答by Yu-Yang
As it's quite difficult to clarify where the problem is, I created a toy example from your code, and it seems to work alright.
由于很难弄清楚问题出在哪里,因此我根据您的代码创建了一个玩具示例,它似乎可以正常工作。
import numpy as np
from numpy.testing import assert_allclose
from keras.models import Sequential, load_model
from keras.layers import LSTM, Dropout, Dense
from keras.callbacks import ModelCheckpoint
vec_size = 100
n_units = 10
x_train = np.random.rand(500, 10, vec_size)
y_train = np.random.rand(500, vec_size)
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath = "model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)
# load the model
new_model = load_model(filepath)
assert_allclose(model.predict(x_train),
new_model.predict(x_train),
1e-5)
# fit the model
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
new_model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)
The loss continues to decrease after model loading. (restarting python also gives no problem)
模型加载后损失继续减少。(重启python也没问题)
Using TensorFlow backend.
Epoch 1/5
500/500 [==============================] - 2s - loss: 0.3216 Epoch 00000: loss improved from inf to 0.32163, saving model to model.h5
Epoch 2/5
500/500 [==============================] - 0s - loss: 0.2923 Epoch 00001: loss improved from 0.32163 to 0.29234, saving model to model.h5
Epoch 3/5
500/500 [==============================] - 0s - loss: 0.2542 Epoch 00002: loss improved from 0.29234 to 0.25415, saving model to model.h5
Epoch 4/5
500/500 [==============================] - 0s - loss: 0.2086 Epoch 00003: loss improved from 0.25415 to 0.20860, saving model to model.h5
Epoch 5/5
500/500 [==============================] - 0s - loss: 0.1725 Epoch 00004: loss improved from 0.20860 to 0.17249, saving model to model.h5
Epoch 1/5
500/500 [==============================] - 0s - loss: 0.1454 Epoch 00000: loss improved from inf to 0.14543, saving model to model.h5
Epoch 2/5
500/500 [==============================] - 0s - loss: 0.1289 Epoch 00001: loss improved from 0.14543 to 0.12892, saving model to model.h5
Epoch 3/5
500/500 [==============================] - 0s - loss: 0.1169 Epoch 00002: loss improved from 0.12892 to 0.11694, saving model to model.h5
Epoch 4/5
500/500 [==============================] - 0s - loss: 0.1097 Epoch 00003: loss improved from 0.11694 to 0.10971, saving model to model.h5
Epoch 5/5
500/500 [==============================] - 0s - loss: 0.1057 Epoch 00004: loss improved from 0.10971 to 0.10570, saving model to model.h5
BTW, redefining the model followed by load_weight()
definitely won't work, because save_weight()
and load_weight()
does not save/load the optimizer.
顺便说一句,重新定义模型之后load_weight()
肯定是行不通的,因为save_weight()
并且load_weight()
不会保存/加载优化器。
回答by David
I compared my code with this example http://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/by carefully block out line-by-line and run again. After a whole day, finally, I found what was wrong.
我将我的代码与这个例子http://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/仔细地逐行屏蔽并再次运行。折腾了一天,终于发现哪里不对了。
When making char-int mapping, I used
在进行 char-int 映射时,我使用了
# title_str_reduced is a string
chars = list(set(title_str_reduced))
# make char to int index mapping
char2int = {}
for i in range(len(chars)):
char2int[chars[i]] = i
A set is an unordered data structure. In python, when a set is converted to a list which is ordered, the order is randamly given. Thus my char2int dictionary is randomized everytime when I reopen python. I fixed my code by adding a sorted()
集合是一种无序的数据结构。在python中,当一个集合转换为一个有序的列表时,顺序是随机给出的。因此,每次我重新打开 python 时,我的 char2int 字典都是随机的。我通过添加 sorted() 来修复我的代码
chars = sorted(list(set(title_str_reduced)))
This forces the conversion to a fixed order.
这会强制转换为固定顺序。
回答by Mrinal Jain
The above answer uses tensorflow 1.x. Here is an updated version using Tensorflow 2.x.
上面的答案使用 tensorflow 1.x。这是使用 Tensorflow 2.x 的更新版本。
import numpy as np
from numpy.testing import assert_allclose
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import LSTM, Dropout, Dense
from tensorflow.keras.callbacks import ModelCheckpoint
vec_size = 100
n_units = 10
x_train = np.random.rand(500, 10, vec_size)
y_train = np.random.rand(500, vec_size)
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath = "model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)
# load the model
new_model = load_model("model.h5")
assert_allclose(model.predict(x_train),
new_model.predict(x_train),
1e-5)
# fit the model
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
new_model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)
回答by user30012
The checkmarked Answer is not correct; the real problem is more subtle.
勾选的答案不正确;真正的问题更微妙。
When you create a Model Checkpoint , check the best:
创建模型检查点时,请检查最佳:
cp1 = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') cp1.best
cp1 = ModelCheckpoint(filepath, monitor='loss',verbose=1, save_best_only=True, mode='min') cp1.best
you will see this is set to "np.inf" =. unfortunately thats what they can do.
你会看到这被设置为“np.inf”=。不幸的是,这就是他们能做的。
But when train and recreate the ModelCheckpoint, if you call "fit" and if the loss is less than previously known value, then it seems to work. but in more complex problems, this is not the case, so you will end up saving a bad model and lose the best
但是当训练和重新创建 ModelCheckpoint 时,如果您调用“fit”并且损失小于先前已知的值,那么它似乎可以工作。但在更复杂的问题中,情况并非如此,因此您最终会保存一个糟糕的模型并失去最好的模型
the correct fix with modification shown below:
修改后的正确修复如下所示:
import numpy as np
from numpy.testing import assert_allclose
from keras.models import Sequential, load_model
from keras.layers import LSTM, Dropout, Dense
from keras.callbacks import ModelCheckpoint
vec_size = 100
n_units = 10
x_train = np.random.rand(500, 10, vec_size)
y_train = np.random.rand(500, vec_size)
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath = "model.h5"
cp1= ModelCheckpoint(filepath=filepath, monitor='loss', save_best_only=True, verbose=1, mode='min')
callbacks_list = [cp1]
# fit the model
model.fit(x_train, y_train, epochs=5, batch_size=50, shuffle=True, validation_split=0.1, callbacks=callbacks_list)
# load the model
new_model = load_model(filepath)
#assert_allclose(model.predict(x_train),new_model.predict(x_train), 1e-5)
score = model.evaluate(x_train, y_train, batch_size=50)
cp1 = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
cp1.best = score # <== ****THIS IS THE KEY **** See source for ModelCheckpoint
# fit the model
callbacks_list = [cp1]
new_model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)
回答by Anubhav Apurva
Here is the official kera's Documentation to save a model:
这是用于保存模型的官方 kera 文档:
https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model
https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model
In this postthe author provides two examples of saving and loading your model to file as:
在这篇文章中,作者提供了两个将模型保存和加载到文件的示例:
- JSON format.
- YAML foramt.
- JSON 格式。
- YAML 格式。
回答by bruce
I think you can write
我觉得你可以写
model.save('partly_trained.h5' )
model.save('partly_trained.h5' )
and
和
model = load_model('partly_trained.h5')
,
model = load_model('partly_trained.h5')
,
instead of
代替
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
,
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
,
Then go continuing training. Because model.save store both architecture & weights.
然后继续训练。因为 model.save 存储架构和权重。
回答by MH304
assume you have a code like this:
假设您有这样的代码:
model = some_model_you_made(input_img) # you compiled your model in this
model.summary()
model_checkpoint = ModelCheckpoint('yours.h5', monitor='val_loss', verbose=1, save_best_only=True)
model_json = model.to_json()
with open("yours.json", "w") as json_file:
json_file.write(model_json)
model.fit_generator(#stuff...) # or model.fit(#stuff...)
Now turn your code into this:
现在把你的代码变成这样:
model = some_model_you_made(input_img) #same model here
model.summary()
model_checkpoint = ModelCheckpoint('yours.h5', monitor='val_loss', verbose=1, save_best_only=True) #same ckeckpoint
model_json = model.to_json()
with open("yours.json", "w") as json_file:
json_file.write(model_json)
with open('yours.json', 'r') as f:
old_model = model_from_json(f.read()) # open the model you just saved (same as your last train) with a different name
old_model.load_weights('yours.h5') # the model checkpoint you trained before
old_model.compile(#stuff...) # need to compile again (exactly like the last compile)
# now start training with the checkpoint...
old_model.fit_generator(#same stuff like the last train) # or model.fit(#stuff...)