cherry.pg¶

`cherry.pg.generalized_advantage(gamma, tau, rewards, dones, values, next_value)` ¶

Description¶

Computes the generalized advantage estimator. (GAE)

References¶

Schulman et al. 2015. “High-Dimensional Continuous Control Using Generalized Advantage Estimation”
https://github.com/joschu/modular_rl/blob/master/modular_rl/core.py#L49

Arguments¶

gamma (float) - Discount factor.
tau (float) - Bias-variance trade-off.
rewards (tensor) - Tensor of rewards.
dones (tensor) - Tensor indicating episode termination. Entry is 1 if the transition led to a terminal (absorbing) state, 0 else.
values (tensor) - Values for the states producing the rewards.
next_value (tensor) - Value of the state obtained after the transition from the state used to compute the last value in values.

Returns¶

tensor - Tensor of advantages.

Example¶

mass, next_value = policy(replay[-1].next_state)
advantages = generalized_advantage(0.99,
                                   0.95,
                                   replay.reward(),
                                   replay.value(),
                                   replay.done(),
                                   next_value)

cherry.pg¶

cherry.pg.generalized_advantage(gamma, tau, rewards, dones, values, next_value) ¶

Description¶

References¶

Arguments¶

Returns¶

Example¶

`cherry.pg.generalized_advantage(gamma, tau, rewards, dones, values, next_value)` ¶