cherry.pg

Description

Utilities to implement policy gradient algorithms.

generalized_advantage

generalized_advantage(gamma, tau, rewards, dones, values, next_value)

Description

Computes the generalized advantage estimator. (GAE)

References

  1. Schulman et al. 2015. “High-Dimensional Continuous Control Using Generalized Advantage Estimation”
  2. https://github.com/joschu/modular_rl/blob/master/modular_rl/core.py#L49

Arguments

Returns

Example

mass, next_value = policy(replay[-1].next_state)
advantages = generalized_advantage(0.99,
                                   0.95,
                                   replay.reward(),
                                   replay.value(),
                                   replay.done(),
                                   next_value)