Python 在张量板中记录训练和验证损失

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34471563/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:02:32  来源:igfitidea点击:

Logging training and validation loss in tensorboard

pythontensorflowtensorboard

提问by user3468216

I'm trying to learn how to use tensorflow and tensorboard. I have a test project based on the MNIST neural net tutorial.

我正在尝试学习如何使用 tensorflow 和 tensorboard。我有一个基于MNIST 神经网络教程的测试项目。

In my code, I construct a node that calculates the fraction of digits in a data set that are correctly classified, like this:

在我的代码中,我构建了一个节点来计算数据集中正确分类的数字的分数,如下所示:

correct = tf.nn.in_top_k(self._logits, labels, 1)
correct = tf.to_float(correct)
accuracy = tf.reduce_mean(correct)

Here, self._logitsis the inference part of the graph, and labelsis a placeholder that contains the correct labels.

这里,self._logits是图的推理部分,labels是一个包含正确标签的占位符。

Now, what I would like to do is evaluate the accuracy for both the training set and the validation set as training proceeds. I can do this by running the accuracy node twice, with different feed_dicts:

现在,我想做的是随着训练的进行评估训练集和验证集的准确性。我可以通过使用不同的 feed_dicts 两次运行准确度节点来做到这一点:

train_acc = tf.run(accuracy, feed_dict={images : training_set.images, labels : training_set.labels})
valid_acc = tf.run(accuracy, feed_dict={images : validation_set.images, labels : validation_set.labels})

This works as intended. I can print the values, and I can see that initially, the two accuracies will both increase, and eventually the validation accuracy will flatten out while the training accuracy keeps increasing.

这按预期工作。我可以打印这些值,我可以看到,最初,两个准确度都会增加,最终验证准确度将趋于平缓,而训练准确度不断提高。

However, I would also like to get graphs of these values in tensorboard, and I can not figure out how to do this. If I simply add a scalar_summaryto accuracy, the logged values will not distinguish between training set and validation set.

但是,我也想在 tensorboard 中获取这些值的图表,但我不知道如何做到这一点。如果我只是添加一个scalar_summaryto accuracy,记录的值将无法区分训练集和验证集。

I also tried creating two identical accuracynodes with different names and running one on the training set and one on the validation set. I then add a scalar_summaryto each of these nodes. This does give me two graphs in tensorboard, but instead of one graph showing the training set accuracy and one showing the validation set accuracy, they are both showing identical values that do not match either of the ones printed to the terminal.

我还尝试创建两个accuracy具有不同名称的相同节点,并在训练集上运行一个,在验证集上运行一个。然后我scalar_summary向这些节点中的每一个添加一个。这确实给了我张量板中的两张图,但不是一张图显示训练集精度,一个图显示验证集精度,它们都显示相同的值,与打印到终端的任何一个都不匹配。

I am probably misunderstanding how to solve this problem. What is the recommended way of separately logging the output from a single node for different inputs?

我可能误解了如何解决这个问题。为不同输入分别记录单个节点的输出的推荐方法是什么?

回答by mrry

There are several different ways you could achieve this, but you're on the right track with creating different tf.summary.scalar()nodes. Since you must explicitly call SummaryWriter.add_summary()each time you want to log a quantity to the event file, the simplest approach is probably to fetch the appropriate summary node each time you want to get the training or validation accuracy:

有几种不同的方法可以实现这一点,但您在创建不同tf.summary.scalar()节点的正确轨道上。由于SummaryWriter.add_summary()每次要将数量记录到事件文件时都必须显式调用,因此最简单的方法可能是每次要获得训练或验证准确度时获取适当的摘要节点:

accuracy = tf.reduce_mean(correct)

training_summary = tf.summary.scalar("training_accuracy", accuracy)
validation_summary = tf.summary.scalar("validation_accuracy", accuracy)


summary_writer = tf.summary.FileWriter(...)

for step in xrange(NUM_STEPS):

  # Perform a training step....

  if step % LOG_PERIOD == 0:

    # To log training accuracy.
    train_acc, train_summ = sess.run(
        [accuracy, training_summary], 
        feed_dict={images : training_set.images, labels : training_set.labels})
    writer.add_summary(train_summ, step) 

    # To log validation accuracy.
    valid_acc, valid_summ = sess.run(
        [accuracy, validation_summary],
        feed_dict={images : validation_set.images, labels : validation_set.labels})
    writer.add_summary(valid_summ, step)

Alternatively, you could create a single summary op whose tag is a tf.placeholder(tf.string, [])and feed the string "training_accuracy"or "validation_accuracy"as appropriate.

或者,您可以创建一个单一的摘要操作,其标签为 atf.placeholder(tf.string, [])并提供字符串"training_accuracy""validation_accuracy"视情况而定。

回答by stillPatrick

Another way to do it, is to use a second file writer. So you are able to use the merge_summaries command.

另一种方法是使用第二个文件编写器。因此,您可以使用 merge_summaries 命令。

train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train',
                                      sess.graph)
test_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/test')
tf.global_variables_initializer().run()

Here is the complete documentation. This works for me fine : TensorBoard: Visualizing Learning

这是完整的文档。这对我很有效:TensorBoard:可视化学习