cherry.pg¶
cherry.pg.generalized_advantage(gamma, tau, rewards, dones, values, next_value)
¶
Description¶
Computes the generalized advantage estimator. (GAE)
References¶
- Schulman et al. 2015. “High-Dimensional Continuous Control Using Generalized Advantage Estimation”
- https://github.com/joschu/modular_rl/blob/master/modular_rl/core.py#L49
Arguments¶
gamma
(float) - Discount factor.tau
(float) - Bias-variance trade-off.rewards
(tensor) - Tensor of rewards.dones
(tensor) - Tensor indicating episode termination. Entry is 1 if the transition led to a terminal (absorbing) state, 0 else.values
(tensor) - Values for the states producing the rewards.next_value
(tensor) - Value of the state obtained after the transition from the state used to compute the last value invalues
.
Returns¶
- tensor - Tensor of advantages.
Example¶
mass, next_value = policy(replay[-1].next_state)
advantages = generalized_advantage(0.99,
0.95,
replay.reward(),
replay.value(),
replay.done(),
next_value)