cherry.td

Description

Utilities to implement temporal difference algorithms.

discount

discount(gamma, rewards, dones, bootstrap=0.0)

Description

Discounts rewards at an rate of gamma.

References

  1. Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.

Arguments

Returns

Example

rewards = th.ones(23, 1) * 8
dones = th.zeros_like(rewards)
dones[-1] += 1.0
discounted = ch.rl.discount(0.99,
                            rewards,
                            dones,
                            bootstrap=1.0)

temporal_difference

temporal_difference(gamma, rewards, dones, values, next_values)

Description

Returns the temporal difference residual.

Reference

  1. Sutton, Richard S. 1988. “Learning to Predict by the Methods of Temporal Differences.” Machine Learning 3 (1): 9–44.
  2. Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.

Arguments

Example

values = vf(replay.states())
next_values = vf(replay.next_states())
td_errors = temporal_difference(0.99,
                                replay.reward(),
                                replay.done(),
                                values,
                                next_values)