Python 深度学习 Nan 损失的原因
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40050397/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Deep-Learning Nan loss reasons
提问by Free Url
Perhaps too general a question, but can anyone explain what would cause a Convolutional Neural Network to diverge?
也许是一个太笼统的问题,但谁能解释什么会导致卷积神经网络发散?
Specifics:
规格:
I am using Tensorflow's iris_training model with some of my own data and keep getting
我正在将 Tensorflow 的 iris_training 模型与我自己的一些数据一起使用,并不断获得
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback...
tensorflow.contrib.learn.python.learn.monitors.NanLossDuringTrainingError: NaN loss during training.
错误:张量流:模型发散,损失 = NaN。
追溯...
tensorflow.contrib.learn.python.learn.monitors.NanLossDuringTrainingError:训练期间的NaN损失。
Traceback originated with line:
追溯起源于以下行:
tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units=[300, 300, 300],
#optimizer=tf.train.ProximalAdagradOptimizer(learning_rate=0.001, l1_regularization_strength=0.00001),
n_classes=11,
model_dir="/tmp/iris_model")
I've tried adjusting the optimizer, using a zero for learning rate, and using no optimizer. Any insights into network layers, data size, etc is appreciated.
我尝试调整优化器,使用零作为学习率,并且不使用优化器。对网络层、数据大小等的任何见解都表示赞赏。
回答by chasep255
There are lots of things I have seen make a model diverge.
我见过很多事情会使模型产生分歧。
Too high of a learning rate. You can often tell if this is the case if the loss begins to increase and then diverges to infinity.
I am not to familiar with the DNNClassifier but I am guessing it uses the categorical cross entropy cost function. This involves taking the log of the prediction which diverges as the prediction approaches zero. That is why people usually add a small epsilon value to the prediction to prevent this divergence. I am guessing the DNNClassifier probably does this or uses the tensorflow opp for it. Probably not the issue.
Other numerical stability issues can exist such as division by zero where adding the epsilon can help. Another less obvious one if the square root who's derivative can diverge if not properly simplified when dealing with finite precision numbers. Yet again I doubt this is the issue in the case of the DNNClassifier.
You may have an issue with the input data. Try calling
assert not np.any(np.isnan(x))
on the input data to make sure you are not introducing the nan. Also make sure all of the target values are valid. Finally, make sure the data is properly normalized. You probably want to have the pixels in the range [-1, 1] and not [0, 255].The labels must be in the domain of the loss function, so if using a logarithmic-based loss function all labels must be non-negative (as noted by evan pu and the comments below).
学习率太高。如果损失开始增加然后发散到无穷大,您通常可以判断是否是这种情况。
我不熟悉 DNNClassifier,但我猜它使用了分类交叉熵成本函数。这涉及采用随着预测接近零而发散的预测对数。这就是为什么人们通常会在预测中添加一个小的 epsilon 值以防止这种差异。我猜 DNNClassifier 可能会这样做或使用 tensorflow opp。应该不是问题。
可能存在其他数值稳定性问题,例如除以零,其中添加 epsilon 会有所帮助。如果在处理有限精度数时没有适当简化,那么导数的平方根会发散,这是另一个不太明显的问题。然而,我再次怀疑这是 DNNClassifier 的问题。
您可能对输入数据有问题。尝试调用
assert not np.any(np.isnan(x))
输入数据以确保您没有引入 nan。还要确保所有目标值都有效。最后,确保数据已正确规范化。您可能希望像素在 [-1, 1] 而不是 [0, 255] 范围内。标签必须在损失函数的域中,所以如果使用基于对数的损失函数,所有标签必须是非负的(如 evan pu 和下面的评论所述)。
回答by Evan Pu
If you're training for cross entropy, you want to add a small number like 1e-8 to your output probability.
如果您正在训练交叉熵,您希望在输出概率中添加一个像 1e-8 这样的小数。
Because log(0) is negative infinity, when your model trained enough the output distribution will be very skewed, for instance say I'm doing a 4 class output, in the beginning my probability looks like
因为 log(0) 是负无穷大,当您的模型训练得足够多时,输出分布将非常偏斜,例如说我正在做 4 类输出,一开始我的概率看起来像
0.25 0.25 0.25 0.25
but toward the end the probability will probably look like
但到最后,概率可能看起来像
1.0 0 0 0
And you take a cross entropy of this distribution everything will explode. The fix is to artifitially add a small number to all the terms to prevent this.
你对这个分布进行交叉熵,一切都会爆炸。解决方法是在所有条款中人为地添加一个小数字以防止出现这种情况。
回答by Guido
In my case I got NAN when setting distant integer LABELs. ie:
在我的情况下,我在设置远程整数标签时得到了 NAN。IE:
- Labels [0..100] the training was ok,
- Labels [0..100] plus one additional label 8000, then I got NANs.
- 标签 [0..100] 训练没问题,
- 标签 [0..100] 加上一个额外的标签 8000,然后我得到了 NAN。
So, not use a very distant Label.
所以,不要使用距离很远的标签。
EDIT You can see the effect in the following simple code:
编辑您可以在以下简单代码中看到效果:
from keras.models import Sequential
from keras.layers import Dense, Activation
import numpy as np
X=np.random.random(size=(20,5))
y=np.random.randint(0,high=5, size=(20,1))
model = Sequential([
Dense(10, input_dim=X.shape[1]),
Activation('relu'),
Dense(5),
Activation('softmax')
])
model.compile(optimizer = "Adam", loss = "sparse_categorical_crossentropy", metrics = ["accuracy"] )
print('fit model with labels in range 0..5')
history = model.fit(X, y, epochs= 5 )
X = np.vstack( (X, np.random.random(size=(1,5))))
y = np.vstack( ( y, [[8000]]))
print('fit model with labels in range 0..5 plus 8000')
history = model.fit(X, y, epochs= 5 )
The result shows the NANs after adding the label 8000:
结果显示了添加标签 8000 后的 NAN:
fit model with labels in range 0..5
Epoch 1/5
20/20 [==============================] - 0s 25ms/step - loss: 1.8345 - acc: 0.1500
Epoch 2/5
20/20 [==============================] - 0s 150us/step - loss: 1.8312 - acc: 0.1500
Epoch 3/5
20/20 [==============================] - 0s 151us/step - loss: 1.8273 - acc: 0.1500
Epoch 4/5
20/20 [==============================] - 0s 198us/step - loss: 1.8233 - acc: 0.1500
Epoch 5/5
20/20 [==============================] - 0s 151us/step - loss: 1.8192 - acc: 0.1500
fit model with labels in range 0..5 plus 8000
Epoch 1/5
21/21 [==============================] - 0s 142us/step - loss: nan - acc: 0.1429
Epoch 2/5
21/21 [==============================] - 0s 238us/step - loss: nan - acc: 0.2381
Epoch 3/5
21/21 [==============================] - 0s 191us/step - loss: nan - acc: 0.2381
Epoch 4/5
21/21 [==============================] - 0s 191us/step - loss: nan - acc: 0.2381
Epoch 5/5
21/21 [==============================] - 0s 188us/step - loss: nan - acc: 0.2381
回答by yper
If using integers as targets, makes sure they aren't symmetrical at 0.
如果使用整数作为目标,请确保它们在 0 处不对称。
I.e., don't use classes -1, 0, 1. Use instead 0, 1, 2.
即,不要使用类 -1, 0, 1。而是使用 0, 1, 2。
回答by Kevin Johnsrude
If you'd like to gather more information on the error and if the error occurs in the first few iterations, I suggest you run the experiment in CPU-only mode (no GPUs). The error message will be much more specific.
如果您想收集有关错误的更多信息,并且如果错误发生在前几次迭代中,我建议您在仅 CPU 模式(无 GPU)下运行实验。错误消息将更加具体。
Source: https://github.com/tensorflow/tensor2tensor/issues/574
回答by chrishmorris
Regularization can help. For a classifier, there is a good case for activity regularization, whether it is binary or a multi-class classifier. For a regressor, kernel regularization might be more appropriate.
正则化可以提供帮助。对于分类器,无论是二元分类器还是多类分类器,都有一个很好的活动正则化案例。对于回归器,内核正则化可能更合适。
回答by Lerner Zhang
I'd like to plug in some (shallow) reasons I have experienced as follows:
我想插入一些我经历过的(浅的)原因如下:
- we may have updated our dictionary(for NLP tasks) but the model and the prepared data used a different one.
- we may have reprocessed our data(binary tf_record) but we loaded the old model. The reprocessed data may conflict with the previous one.
- we may should train the model from scratch but we forgot to delete the checkpoints and the model loaded the latest parameters automatically.
- 我们可能已经更新了我们的字典(用于 NLP 任务),但模型和准备好的数据使用了不同的字典。
- 我们可能已经重新处理了我们的数据(二进制 tf_record),但我们加载了旧模型。重新处理的数据可能与前一个数据冲突。
- 我们可能应该从头开始训练模型,但我们忘记删除检查点,模型会自动加载最新的参数。
Hope that helps.
希望有帮助。