cherry.pg

cherry.pg.generalized_advantage(gamma, tau, rewards, dones, values, next_value)

Description

Computes the generalized advantage estimator. (GAE)

References
  1. Schulman et al. 2015. “High-Dimensional Continuous Control Using Generalized Advantage Estimation”
  2. https://github.com/joschu/modular_rl/blob/master/modular_rl/core.py#L49
Arguments
  • gamma (float) - Discount factor.
  • tau (float) - Bias-variance trade-off.
  • rewards (tensor) - Tensor of rewards.
  • dones (tensor) - Tensor indicating episode termination. Entry is 1 if the transition led to a terminal (absorbing) state, 0 else.
  • values (tensor) - Values for the states producing the rewards.
  • next_value (tensor) - Value of the state obtained after the transition from the state used to compute the last value in values.
Returns
  • tensor - Tensor of advantages.
Example
mass, next_value = policy(replay[-1].next_state)
advantages = generalized_advantage(0.99,
                                   0.95,
                                   replay.reward(),
                                   replay.value(),
                                   replay.done(),
                                   next_value)