cherry.optim¶
cherry.optim.Distributed
¶
Description¶
Synchronizes the gradients of a model across replicas.
At every step, Distributed
averages the gradient across all replicas
before calling the wrapped optimizer.
The sync
parameters determines how frequently the parameters are
synchronized between replicas, to minimize numerical divergences.
This is done by calling the sync_parameters()
method.
If sync is None
, this never happens except upon initialization of the
class.
References¶
- Zinkevich et al. 2010. “Parallelized Stochastic Gradient Descent.”
Example¶
opt = optim.Adam(model.parameters())
opt = Distributed(model.parameters(), opt, sync=1)
opt.step()
opt.sync_parameters()