Python “冻结”张量流中的一些变量/范围:stop_gradient vs 传递变量以最小化
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35298326/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"freeze" some variables/scopes in tensorflow: stop_gradient vs passing variables to minimize
提问by Dima Lituiev
I am trying to implement Adversarial NN, which requires to 'freeze' one or the other part of the graph during alternating training minibatches. I.e. there two sub-networks: G and D.
我正在尝试实现Adversarial NN,它需要在交替训练小批量期间“冻结”图形的一个或另一部分。即有两个子网络:G 和 D。
G( Z ) -> Xz
D( X ) -> Y
where loss function of G
depends on D[G(Z)], D[X]
.
其中损失函数G
取决于D[G(Z)], D[X]
。
First I need to train parameters in D with all G parameters fixed, and then parameters in G with parameters in D fixed. Loss function in first case will be negative loss function in the second case and the update will have to apply to the parameters of whether first or second subnetwork.
首先我需要在所有 G 参数固定的情况下训练 D 中的参数,然后在 D 中的参数固定的情况下训练 G 中的参数。第一种情况下的损失函数在第二种情况下将是负损失函数,并且更新必须适用于第一个或第二个子网的参数。
I saw that tensorflow has tf.stop_gradient
function. For purpose of training the D (downstream) subnetwork I can use this function to block the gradient flow to
我看到 tensorflow 有tf.stop_gradient
功能。为了训练 D(下游)子网络,我可以使用这个函数来阻止梯度流到
Z -> [ G ] -> tf.stop_gradient(Xz) -> [ D ] -> Y
The tf.stop_gradient
is very succinctly annotated with no in-line example (and example seq2seq.py
is too long and not that easy to read), but looks like it must be called during the graph creation. Does it imply that if I want to block/unblock gradient flow in alternating batches, I need to re-create and re-initialize the graph model?
该tf.stop_gradient
非常简洁,没有在行示例注释(和示例seq2seq.py
太长,不是那么容易读),但看起来它必须在图形制作过程中被调用。这是否意味着如果我想在交替批次中阻止/解除阻止梯度流,我需要重新创建和重新初始化图形模型?
Also it seems that one cannot block the gradient flowing through the G (upstream) network by means of tf.stop_gradient
, right?
似乎也无法通过 来阻止流经 G(上游)网络的梯度tf.stop_gradient
,对吗?
As an alternative I saw that one can pass the list of variables to the optimizer call as opt_op = opt.minimize(cost, <list of variables>)
, which would be an easy solution if one could get all variables in the scopes of each subnetwork. Can one get a <list of variables>
for a tf.scope?
作为替代方案,我看到可以将变量列表传递给优化器调用 as opt_op = opt.minimize(cost, <list of variables>)
,如果可以获取每个子网范围内的所有变量,这将是一个简单的解决方案。可以得到<list of variables>
一个 tf.scope 吗?
采纳答案by mrry
The easiest way to achieve this, as you mention in your question, is to create two optimizer operations using separate calls to opt.minimize(cost, ...)
. By default, the optimizer will use all of the variables in tf.trainable_variables()
. If you want to filter the variables to a particular scope, you can use the optional scope
argument to tf.get_collection()
as follows:
正如您在问题中提到的,实现这一目标的最简单方法是使用对opt.minimize(cost, ...)
. 默认情况下,优化器将使用tf.trainable_variables()
. 如果要将变量过滤到特定范围,可以使用可选scope
参数 totf.get_collection()
如下:
optimizer = tf.train.AdagradOptimzer(0.01)
first_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
"scope/prefix/for/first/vars")
first_train_op = optimizer.minimize(cost, var_list=first_train_vars)
second_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
"scope/prefix/for/second/vars")
second_train_op = optimizer.minimize(cost, var_list=second_train_vars)
回答by Daniel Slater
Another option you might want to consider is you can set trainable=False on a variable. Which means it will not be modified by training.
您可能要考虑的另一个选项是您可以在变量上设置 trainable=False。这意味着它不会被训练修改。
tf.Variable(my_weights, trainable=False)
回答by user3307732
I don't know if my approach has down sides, but I solved this issue for myself with this construct:
我不知道我的方法是否有缺点,但我用这个结构为自己解决了这个问题:
do_gradient = <Tensor that evaluates to 0 or 1>
no_gradient = 1 - do_gradient
wrapped_op = do_gradient * original + no_gradient * tf.stop_gradient(original)
So if do_gradient = 1
, the values and gradients will flow through just fine, but if do_gradient = 0
, then the values will only flow through the stop_gradient op, which will stop the gradients flowing back.
因此,如果do_gradient = 1
,值和梯度将流过就好了,但是如果do_gradient = 0
,则值将仅流过 stop_gradient 操作,这将阻止梯度流回。
For my scenario, hooking do_gradient up to an index of a random_shuffle tensor let me randomly train different pieces of my network.
对于我的场景,将 do_gradient 与 random_shuffle 张量的索引挂钩让我可以随机训练网络的不同部分。
回答by Alex Williams
@mrry's answer is completely right and perhaps more general than what I'm about to suggest. But I think a simpler way to accomplish it is to just pass the python reference directly to var_list
:
@mrry 的回答完全正确,而且可能比我要建议的更笼统。但我认为实现它的更简单方法是将 python 引用直接传递给var_list
:
W = tf.Variable(...)
C = tf.Variable(...)
Y_est = tf.matmul(W,C)
loss = tf.reduce_sum((data-Y_est)**2)
optimizer = tf.train.AdamOptimizer(0.001)
# You can pass the python object directly
train_W = optimizer.minimize(loss, var_list=[W])
train_C = optimizer.minimize(loss, var_list=[C])
I have a self-contained example here: https://gist.github.com/ahwillia/8cedc710352eb919b684d8848bc2df3a
我在这里有一个独立的例子:https: //gist.github.com/ahwillia/8cedc710352eb919b684d8848bc2df3a