cherry.td¶

`cherry.td.discount(gamma, rewards, dones, bootstrap = 0.0)` ¶

Description¶

Discounts rewards at an rate of gamma.

References¶

Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.

Arguments¶

gamma (float) - Discount factor.
rewards (tensor) - Tensor of rewards.
dones (tensor) - Tensor indicating episode termination. Entry is 1 if the transition led to a terminal (absorbing) state, 0 else.
bootstrap (float, optional, default=0.0) - Bootstrap the last reward with this value.

Returns¶

tensor - Tensor of discounted rewards.

Example¶

rewards = th.ones(23, 1) * 8
dones = th.zeros_like(rewards)
dones[-1] += 1.0
discounted = ch.rl.discount(0.99,
                            rewards,
                            dones,
                            bootstrap=1.0)

`cherry.td.temporal_difference(gamma, rewards, dones, values, next_values)` ¶

Description¶

Returns the temporal difference residual.

Reference¶

Sutton, Richard S. 1988. “Learning to Predict by the Methods of Temporal Differences.” Machine Learning 3 (1): 9–44.
Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.

Arguments¶

gamma (float) - Discount factor.
rewards (tensor) - Tensor of rewards.
dones (tensor) - Tensor indicating episode termination. Entry is 1 if the transition led to a terminal (absorbing) state, 0 else.
values (tensor) - Values for the states producing the rewards.
next_values (tensor) - Values of the state obtained after the transition from the state used to compute the last value in values.

Example¶

values = vf(replay.states())
next_values = vf(replay.next_states())
td_errors = temporal_difference(0.99,
                                replay.reward(),
                                replay.done(),
                                values,
                                next_values)

cherry.td¶

cherry.td.discount(gamma, rewards, dones, bootstrap = 0.0) ¶

Description¶

References¶

Arguments¶

Returns¶

Example¶

cherry.td.temporal_difference(gamma, rewards, dones, values, next_values) ¶

Description¶

Reference¶

Arguments¶

Example¶

`cherry.td.discount(gamma, rewards, dones, bootstrap = 0.0)` ¶

`cherry.td.temporal_difference(gamma, rewards, dones, values, next_values)` ¶