Python Tensorflow:图形已完成且无法修改

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41798311/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 01:37:30  来源:igfitidea点击:

Tensorflow : Graph is finalized and cannot be modified

pythontensorflow

提问by itsamineral

I am trying to save variables through checkpoints to introduce fault tolerance to my program. I am trying to achieve this by using the MonitoredTrainingSession function. The following is my configuration:-

我正在尝试通过检查点保存变量以将容错引入我的程序。我试图通过使用 MonitoredTrainingSession 函数来实现这一点。以下是我的配置:-

import tensorflow as tf

global_step = tf.Variable(10, trainable=False, name='global_step')
x = tf.constant(2)

with tf.device("/job:local/task:0"):
    y1 = tf.Variable(x + 300)

with tf.device("/job:local/task:1"):
    y2 = tf.Variable(x**2)

with tf.device("/job:local/task:2"):
    y3 = tf.Variable(5*x)

with tf.device("/job:local/task:3"):
    y0 = tf.Variable(x - 66)
    y = y0 + y1 + y2 + y3

model = tf.global_variables_initializer()
saver = tf.train.Saver(sharded=True)

chief = tf.train.ChiefSessionCreator(scaffold=None, master='grpc://localhost:2222', config=None, checkpoint_dir='/home/tensorflow/codes/checkpoints')
summary_hook = tf.train.SummarySaverHook(save_steps=None, save_secs=10, output_dir='/home/tensorflow/codes/savepoints', summary_writer=None, scaffold=None, summary_op=tf.summary.tensor_summary(name="y", tensor=y))
saver_hook = tf.train.CheckpointSaverHook(checkpoint_dir='/home/tensorflow/codes/checkpoints', save_secs=None, save_steps=True, saver=saver, checkpoint_basename='model.ckpt', scaffold=None)

# with tf.train.MonitoredSession(session_creator=ChiefSessionCreator,hooks=[saver_hook, summary_hook]) as sess:

with tf.train.MonitoredTrainingSession(master='grpc://localhost:2222', is_chief=True, checkpoint_dir='/home/tensorflow/codes/checkpoints',
    scaffold=None, hooks=[saver_hook,summary_hook], chief_only_hooks=None, save_checkpoint_secs=None, save_summaries_steps=True, config=None) as sess:

    while not sess.should_stop():
        sess.run(tf.global_variables_initializer())

    while not sess.should_stop():
        result = sess.run(y)
        print(result)

I get the following RuntimeErrorwhich I am unable to resolve:-

我收到以下无法解决的RuntimeError:-

Traceback (most recent call last):
  File "add_1.py", line 39, in <module>
    sess.run(tf.global_variables_initializer())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 1187, in global_variables_initializer
    return variables_initializer(global_variables())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 1169, in variables_initializer
    return control_flow_ops.group(*[v.initializer for v in var_list], name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2773, in group
    deps.append(_GroupControlDeps(dev, ops_on_device[dev]))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2721, in _GroupControlDeps
    return no_op(name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_control_flow_ops.py", line 186, in no_op
    result = _op_def_lib.apply_op("NoOp", name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2199, in create_op
    self._check_not_finalized()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1925, in _check_not_finalized
    raise RuntimeError("Graph is finalized and cannot be modified.")
RuntimeError: Graph is finalized and cannot be modified.

回答by guinny

The root cause for your error seems to be that MonitoredTrainingSession has finalized (frozen) the graph and your tf.global_variable_initializer()is no longer able to modify it.

您的错误的根本原因似乎是 MonitoredTrainingSession 已完成(冻结)图表并且您tf.global_variable_initializer()无法再修改它。

Having said that, there are multiple things that require attention:

话虽如此,有很多事情需要注意:

1) Why do you try to repeatedly initialize all variables here?

1)为什么要在这里反复初始化所有变量?

while not sess.should_stop():
    sess.run(tf.global_variables_initializer())

2) It seems some of your code is already included in MonitoredTrainingSession, e.g. ChiefSessionCreator. Can you please take another look at the code (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/monitored_session.py#L243) or search for its sample usage and see how MonitoredTrainingSessionis supposed to be used?

2) 似乎您的某些代码已包含在 中MonitoredTrainingSession,例如ChiefSessionCreator. 您能否再看看代码(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/monitored_session.py#L243)或搜索其示例用法,看看MonitoredTrainingSession应该如何使用?

回答by matwilso

This may not be recommended for your use case, but it is possible to unfinalize a Graph:

对于您的用例,可能不建议这样做,但可以取消确定 Graph

sess.graph._unsafe_unfinalize()

回答by drimyus

If you want to initialize the graph on loop, you can use the function to create new graph on top of loop.

如果要在循环上初始化图形,可以使用该函数在循环顶部创建新图形。

import tensorflow as tf

tf.reset_default_graph()
tf.Graph().as_default()

回答by drrngrvy

Since your aim is to use MonitoredTrainingSessionto get you checkpointing, the usage is much simpler than your example:

由于您的目标是用于MonitoredTrainingSession检查点,因此用法比您的示例简单得多:

import tensorflow as tf

global_step = tf.contrib.framework.get_or_create_global_step()
x = tf.constant(2)
y1 = x + 300
y2 = x**2
y3 = x * 5
y0 = x - 66
y = y0 + y1 + y2 + y3
step = tf.assign_add(global_step, 1)

with tf.train.MonitoredTrainingSession(checkpoint_dir='/tmp/checkpoints') as sess:
    while not sess.should_stop():
        result, i = sess.run([y, step])
        print(result, i)
  • The hooks for saving/restoring checkpoints are created by MonitoredTrainingSessionfor you.
  • If you pass in save_checkpoint_secsyou can change the frequency of checkpointing from the 10 minute default. I find a higher frequency isn't worth it: saving checkpoints isn't free, so very frequent checkpointing will end up slowing training down.
  • The ChiefSessionCreatorand gRPC config is only needed for distributed running (see herefor a description of the concepts. Similarly with assigning ops to specific devices - make sure you really need to do this before using it as it can slow things down if you're not careful.
  • You don't need to wrap the result of operations on tensors with tf.Variable()- they already are variables.
  • You can pass save_summaries_stepsfor monitoring training with tensorboard, but by default that'll happen every 100 steps anyway.
  • 保存/恢复检查点的钩子是由MonitoredTrainingSession你创建的。
  • 如果您传入,save_checkpoint_secs您可以将检查点的频率从 10 分钟的默认值更改。我发现更高的频率是不值得的:保存检查点不是免费的,所以非常频繁的检查点最终会减慢训练速度。
  • ChiefSessionCreator只需要和GRPC配置分布式运行(参见此处为概念的描述同样与OPS分配到特定的设备-确保你真的需要使用它之前,要做到这一点,您是不是可以慢下来小心。
  • 您不需要将张量的运算结果包装在tf.Variable()- 它们已经是变量。
  • 您可以通过save_summaries_stepstensorboard 进行监控训练,但默认情况下,无论如何都会每 100 步发生一次。