cherry.optim

Description

Optimization utilities for scalable, high-performance reinforcement learning.

Distributed

Distributed(params, opt, sync=None)

[Source]

Description

Synchronizes the gradients of a model across replicas.

At every step, Distributed averages the gradient across all replicas before calling the wrapped optimizer. The sync parameters determines how frequently the parameters are synchronized between replicas, to minimize numerical divergences. This is done by calling the sync_parameters() method. If sync is None, this never happens except upon initialization of the class.

Arguments

References

  1. Zinkevich et al. 2010. “Parallelized Stochastic Gradient Descent.”

Example

opt = optim.Adam(model.parameters())
opt = Distributed(model.parameters(), opt, sync=1)

opt.step()
opt.sync_parameters()

sync_parameters

Distributed.sync_parameters(root=0)

Description

Broadcasts all parameters of root to all other replicas.

Arguments