Python 如何在pytorch中进行渐变裁剪?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/54716377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to do gradient clipping in pytorch?
提问by Gulzar
What is the correct way to perform gradient clipping in pytorch?
在 pytorch 中执行渐变裁剪的正确方法是什么?
I have an exploding gradients problem, and I need to program my way around it.
我有一个梯度爆炸问题,我需要围绕它进行编程。
采纳答案by a_guest
clip_grad_norm
(which is actually deprecated in favor of clip_grad_norm_
following the more consistent syntax of a trailing _
when in-place modification is performed) clips the norm of the overallgradient by concatenating all parameters passed to the function, as can be seen from the documentation:
clip_grad_norm
(实际上已弃用,以支持在执行就地修改时clip_grad_norm_
遵循更一致的尾随语法_
)通过连接传递给函数的所有参数来剪辑整体梯度的范数,如文档中所示:
The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.
范数是在所有梯度上一起计算的,就好像它们被连接成一个向量一样。渐变就地修改。
From your example it looks like that you want clip_grad_value_
instead which has a similar syntax and also modifies the gradients in-place:
从您的示例中,它看起来像您想要的clip_grad_value_
那样,它具有类似的语法并且还就地修改了渐变:
clip_grad_value_(model.parameters(), clip_value)
Another option is to register a backward hook. This takes the current gradient as an input and may return a tensor which will be used in-place of the previous gradient, i.e. modifying it. This hook is called each time after a gradient has been computed, i.e. there's no need for manually clipping once the hook has been registered:
另一种选择是注册一个反向钩子。这将当前梯度作为输入,并可能返回一个张量,该张量将用于代替先前的梯度,即修改它。每次计算梯度后都会调用这个钩子,即一旦钩子被注册,就不需要手动裁剪:
for p in model.parameters():
p.register_hook(lambda grad: torch.clamp(grad, -clip_value, clip_value))
回答by Rahul
A more complete example
一个更完整的例子
optimizer.zero_grad()
loss, hidden = model(data, hidden, targets)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
optimizer.step()
回答by Gulzar
Reading through the forum discussiongave this:
通读论坛讨论给出了这个:
clipping_value = 1 # arbitrary value of your choosing
torch.nn.utils.clip_grad_norm(model.parameters(), clipping_value)
I'm sure there is more depth to it than only this code snippet.
我确信它比仅此代码片段更深入。