Python 为什么我们需要在 PyTorch 中调用 zero_grad()？

Question

提问by user1424739

The method zero_grad()needs to be called during training. But the documentationis not very helpful

该方法zero_grad()需要在训练期间调用。但是文档不是很有帮助

|  zero_grad(self)
|      Sets gradients of all model parameters to zero.

Why do we need to call this method?

为什么我们需要调用这个方法？

Answer 1

回答by kmario23

In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradientson subsequent backward passes. This is convenient while training RNNs. So, the default action is to accumulate (i.e. sum) the gradientson every loss.backward()call.

在中PyTorch，我们需要在开始进行反向传播之前将梯度设置为零，因为 PyTorch会在后续的反向传播中累积梯度。这在训练 RNN 时很方便。因此，默认操作是在每次调用时累积（即求和）梯度loss.backward()。

Because of this, when you start your training loop, ideally you should zero out the gradientsso that you do the parameter update correctly. Else the gradient would point in some other direction than the intended direction towards the minimum(or maximum, in case of maximization objectives).

因此，当您开始训练循环时，理想情况下您应该zero out the gradients正确地进行参数更新。否则梯度将指向其他方向，而不是朝向最小值（或最大值，在最大化目标的情况下）的预期方向。

Here is a simple example:

这是一个简单的例子：

import torch
from torch.autograd import Variable
import torch.optim as optim

def linear_model(x, W, b):
    return torch.matmul(x, W) + b

data, targets = ...

W = Variable(torch.randn(4, 3), requires_grad=True)
b = Variable(torch.randn(3), requires_grad=True)

optimizer = optim.Adam([W, b])

for sample, target in zip(data, targets):
    # clear out the gradients of all Variables 
    # in this optimizer (i.e. W, b)
    optimizer.zero_grad()
    output = linear_model(sample, W, b)
    loss = (output - target) ** 2
    loss.backward()
    optimizer.step()

Alternatively, if you're doing a vanilla gradient descent, then:

或者，如果您正在执行香草梯度下降，则：

W = Variable(torch.randn(4, 3), requires_grad=True)
b = Variable(torch.randn(3), requires_grad=True)

for sample, target in zip(data, targets):
    # clear out the gradients of Variables 
    # (i.e. W, b)
    W.grad.data.zero_()
    b.grad.data.zero_()

    output = linear_model(sample, W, b)
    loss = (output - target) ** 2
    loss.backward()

    W -= learning_rate * W.grad.data
    b -= learning_rate * b.grad.data

Note: The accumulation(i.e. sum) of gradients happen when .backward()is called on the losstensor.

注意：梯度的累积（即总和）.backward()在loss张量上调用时发生。

Python 为什么我们需要在 PyTorch 中调用 zero_grad()？

提问by user1424739

回答by kmario23

相关推荐

最近更新

标签

Python 为什么我们需要在 PyTorch 中调用 zero_grad()？

提问by user1424739

回答by kmario23

相关推荐

Python 2.7 不再工作：无法导入名称 md5

Python 将列表绑定到 Pandas read_sql_query 中的参数与其他参数

Python Pandas 使用 0.21.0 对 FutureWarning 进行切片

Python 如何在 PySpark 的 UDF 中返回“元组类型”？

相关推荐

最近更新

标签