cherry.models.utils

RandomPolicy

RandomPolicy(env, *args, **kwargs)

[Source]

Description

Policy that randomly samples actions from the environment action space.

Arguments

Example

policy = ch.models.RandomPolicy(env)
env = envs.Runner(env)
replay = env.run(policy, steps=2048)

polyak_average

polyak_average(source, target, alpha)

[Source]

Description

Shifts the parameters of source towards those of target.

Note: the parameter alpha indicates the convex combination weight of the source. (i.e. the old parameters are kept at a rate of alpha.)

References

  1. Polyak, B., and A. Juditsky. 1992. “Acceleration of Stochastic Approximation by Averaging.”

Arguments

Example

target_qf = nn.Linear(23, 34)
qf = nn.Linear(23, 34)
ch.models.polyak_average(target_qf, qf, alpha=0.9)

cherry.models.tabular

StateValueFunction

StateValueFunction(state_size, init=None)

[Source]

Description

Stores a table of state values, V(s), one for each state.

Assumes that the states are one-hot encoded. Also, the returned values are differentiable and can be used in conjunction with PyTorch's optimizers.

Arguments

References

  1. Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.

Example

vf = StateValueFunction(env.state_size)
state = env.reset()
state = ch.onehot(state, env.state_size)
state_value = vf(state)

ActionValueFunction

ActionValueFunction(state_size, action_size, init=None)

[Source]

Description

Stores a table of action values, Q(s, a), one for each (state, action) pair.

Assumes that the states and actions are one-hot encoded. Also, the returned values are differentiable and can be used in conjunction with PyTorch's optimizers.

Arguments

References

  1. Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.

Example

qf = ActionValueFunction(env.state_size, env.action_size)
state = env.reset()
state = ch.onehot(state, env.state_size)
all_action_values = qf(state)
action = ch.onehot(0, env.action_size)
action_value = qf(state, action)

cherry.models.atari

NatureFeatures

NatureFeatures(input_size=4, hidden_size=512)

[Source]

Description

The convolutional body of the DQN architecture.

References

  1. Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
  2. Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”

Credit

Adapted from Ilya Kostrikov's implementation.

Arguments

NatureActor

NatureActor(input_size, output_size)

[Source]

Description

The actor head of the A3C architecture.

References

  1. Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
  2. Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”

Credit

Adapted from Ilya Kostrikov's implementation.

Arguments

NatureCritic

NatureCritic(input_size, output_size=1)

[Source]

Description

The critic head of the A3C architecture.

References

  1. Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
  2. Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”

Credit

Adapted from Ilya Kostrikov's implementation.

Arguments

cherry.models.robotics

RoboticsMLP

RoboticsMLP(input_size, output_size, layer_sizes=None)

[Source]

Description

A multi-layer perceptron with proper initialization for robotic control.

Credit

Adapted from Ilya Kostrikov's implementation.

Arguments

Example

target_qf = ch.models.robotics.RoboticsMLP(23,
                                         34,
                                         layer_sizes=[32, 32])

RoboticsActor

RoboticsActor(input_size, output_size, layer_sizes=None)

[Source]

Description

A multi-layer perceptron with initialization designed for choosing actions in continuous robotic environments.

Credit

Adapted from Ilya Kostrikov's implementation.

Arguments

Example

policy_mean = ch.models.robotics.Actor(28,
                                      8,
                                      layer_sizes=[64, 32, 16])

LinearValue

LinearValue(input_size, reg=1e-05)

[Source]

Description

A linear state-value function, whose parameters are found by minimizing least-squares.

Credit

Adapted from Tristan Deleu's implementation.

References

  1. Duan et al. 2016. “Benchmarking Deep Reinforcement Learning for Continuous Control.”
  2. https://github.com/tristandeleu/pytorch-maml-rl

Arguments

Example

states = replay.state()
rewards = replay.reward()
dones = replay.done()
returns = ch.td.discount(gamma, rewards, dones)
baseline = LinearValue(input_size)
baseline.fit(states, returns)
next_values = baseline(replay.next_states())