Python Tensorflow：图形已完成且无法修改

Question

提问by itsamineral

I am trying to save variables through checkpoints to introduce fault tolerance to my program. I am trying to achieve this by using the MonitoredTrainingSession function. The following is my configuration:-

我正在尝试通过检查点保存变量以将容错引入我的程序。我试图通过使用 MonitoredTrainingSession 函数来实现这一点。以下是我的配置：-

import tensorflow as tf

global_step = tf.Variable(10, trainable=False, name='global_step')
x = tf.constant(2)

with tf.device("/job:local/task:0"):
    y1 = tf.Variable(x + 300)

with tf.device("/job:local/task:1"):
    y2 = tf.Variable(x**2)

with tf.device("/job:local/task:2"):
    y3 = tf.Variable(5*x)

with tf.device("/job:local/task:3"):
    y0 = tf.Variable(x - 66)
    y = y0 + y1 + y2 + y3

model = tf.global_variables_initializer()
saver = tf.train.Saver(sharded=True)

chief = tf.train.ChiefSessionCreator(scaffold=None, master='grpc://localhost:2222', config=None, checkpoint_dir='/home/tensorflow/codes/checkpoints')
summary_hook = tf.train.SummarySaverHook(save_steps=None, save_secs=10, output_dir='/home/tensorflow/codes/savepoints', summary_writer=None, scaffold=None, summary_op=tf.summary.tensor_summary(name="y", tensor=y))
saver_hook = tf.train.CheckpointSaverHook(checkpoint_dir='/home/tensorflow/codes/checkpoints', save_secs=None, save_steps=True, saver=saver, checkpoint_basename='model.ckpt', scaffold=None)

# with tf.train.MonitoredSession(session_creator=ChiefSessionCreator,hooks=[saver_hook, summary_hook]) as sess:

with tf.train.MonitoredTrainingSession(master='grpc://localhost:2222', is_chief=True, checkpoint_dir='/home/tensorflow/codes/checkpoints',
    scaffold=None, hooks=[saver_hook,summary_hook], chief_only_hooks=None, save_checkpoint_secs=None, save_summaries_steps=True, config=None) as sess:

    while not sess.should_stop():
        sess.run(tf.global_variables_initializer())

    while not sess.should_stop():
        result = sess.run(y)
        print(result)

I get the following RuntimeErrorwhich I am unable to resolve:-

我收到以下无法解决的RuntimeError：-

Traceback (most recent call last):
  File "add_1.py", line 39, in <module>
    sess.run(tf.global_variables_initializer())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 1187, in global_variables_initializer
    return variables_initializer(global_variables())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 1169, in variables_initializer
    return control_flow_ops.group(*[v.initializer for v in var_list], name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2773, in group
    deps.append(_GroupControlDeps(dev, ops_on_device[dev]))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2721, in _GroupControlDeps
    return no_op(name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_control_flow_ops.py", line 186, in no_op
    result = _op_def_lib.apply_op("NoOp", name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2199, in create_op
    self._check_not_finalized()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1925, in _check_not_finalized
    raise RuntimeError("Graph is finalized and cannot be modified.")
RuntimeError: Graph is finalized and cannot be modified.

Answer 1

回答by guinny

The root cause for your error seems to be that MonitoredTrainingSession has finalized (frozen) the graph and your tf.global_variable_initializer()is no longer able to modify it.

您的错误的根本原因似乎是 MonitoredTrainingSession 已完成（冻结）图表并且您tf.global_variable_initializer()无法再修改它。

Having said that, there are multiple things that require attention:

话虽如此，有很多事情需要注意：

1) Why do you try to repeatedly initialize all variables here?

1）为什么要在这里反复初始化所有变量？

while not sess.should_stop():
    sess.run(tf.global_variables_initializer())

2) It seems some of your code is already included in MonitoredTrainingSession, e.g. ChiefSessionCreator. Can you please take another look at the code (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/monitored_session.py#L243) or search for its sample usage and see how MonitoredTrainingSessionis supposed to be used?

2) 似乎您的某些代码已包含在中MonitoredTrainingSession，例如ChiefSessionCreator. 您能否再看看代码（https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/monitored_session.py#L243）或搜索其示例用法，看看MonitoredTrainingSession应该如何使用？

Answer 2

回答by matwilso

This may not be recommended for your use case, but it is possible to unfinalize a Graph:

对于您的用例，可能不建议这样做，但可以取消确定 Graph：

sess.graph._unsafe_unfinalize()

Answer 3

回答by drimyus

If you want to initialize the graph on loop, you can use the function to create new graph on top of loop.

如果要在循环上初始化图形，可以使用该函数在循环顶部创建新图形。

import tensorflow as tf

tf.reset_default_graph()
tf.Graph().as_default()

Answer 4

回答by drrngrvy

Since your aim is to use MonitoredTrainingSessionto get you checkpointing, the usage is much simpler than your example:

由于您的目标是用于MonitoredTrainingSession检查点，因此用法比您的示例简单得多：

import tensorflow as tf

global_step = tf.contrib.framework.get_or_create_global_step()
x = tf.constant(2)
y1 = x + 300
y2 = x**2
y3 = x * 5
y0 = x - 66
y = y0 + y1 + y2 + y3
step = tf.assign_add(global_step, 1)

with tf.train.MonitoredTrainingSession(checkpoint_dir='/tmp/checkpoints') as sess:
    while not sess.should_stop():
        result, i = sess.run([y, step])
        print(result, i)

The hooks for saving/restoring checkpoints are created by MonitoredTrainingSessionfor you.
If you pass in save_checkpoint_secsyou can change the frequency of checkpointing from the 10 minute default. I find a higher frequency isn't worth it: saving checkpoints isn't free, so very frequent checkpointing will end up slowing training down.
The ChiefSessionCreatorand gRPC config is only needed for distributed running (see herefor a description of the concepts. Similarly with assigning ops to specific devices - make sure you really need to do this before using it as it can slow things down if you're not careful.
You don't need to wrap the result of operations on tensors with tf.Variable()- they already are variables.
You can pass save_summaries_stepsfor monitoring training with tensorboard, but by default that'll happen every 100 steps anyway.

保存/恢复检查点的钩子是由MonitoredTrainingSession你创建的。
如果您传入，save_checkpoint_secs您可以将检查点的频率从 10 分钟的默认值更改。我发现更高的频率是不值得的：保存检查点不是免费的，所以非常频繁的检查点最终会减慢训练速度。
该ChiefSessionCreator只需要和GRPC配置分布式运行（参见此处为概念的描述同样与OPS分配到特定的设备-确保你真的需要使用它之前，要做到这一点，您是不是可以慢下来小心。
您不需要将张量的运算结果包装在tf.Variable()- 它们已经是变量。
您可以通过save_summaries_stepstensorboard 进行监控训练，但默认情况下，无论如何都会每 100 步发生一次。

Python Tensorflow：图形已完成且无法修改

提问by itsamineral

回答by guinny

回答by matwilso

回答by drimyus

回答by drrngrvy

相关推荐

最近更新

标签

Python Tensorflow：图形已完成且无法修改

提问by itsamineral

回答by guinny

回答by matwilso

回答by drimyus

回答by drrngrvy

相关推荐

Python 如何将绘图线颜色从蓝色更改为黑色？

Python 布尔系列键将被重新索引以匹配 DataFrame 索引

Python Keras，如何获得每一层的输出？

Python 类型错误：“设置”对象不可下标

相关推荐

最近更新

标签