cherry.models

cherry.models.utils.polyak_average(source, target, alpha)

[Source]

Description

Shifts the parameters of source towards those of target.

Note: the parameter alpha indicates the convex combination weight of the source. (i.e. the old parameters are kept at a rate of alpha.)

References
  1. Polyak, B., and A. Juditsky. 1992. “Acceleration of Stochastic Approximation by Averaging.”
Arguments
  • source (Module) - The module to be shifted.
  • target (Module) - The module indicating the shift direction.
  • alpha (float) - Strength of the shift.
Example
target_qf = nn.Linear(23, 34)
qf = nn.Linear(23, 34)
ch.models.polyak_average(target_qf, qf, alpha=0.9)

cherry.models.utils.RandomPolicy

[Source]

Description

Policy that randomly samples actions from the environment action space.

Example
policy = ch.models.RandomPolicy(env)
env = envs.Runner(env)
replay = env.run(policy, steps=2048)

__init__(self, env, *args, **kwargs) special

Arguments
  • env (Environment) - Environment from which to sample actions.

cherry.models.atari

cherry.models.atari.NatureFeatures

[Source]

Description

The convolutional body of the DQN architecture.

References
  1. Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
  2. Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”
Credit

Adapted from Ilya Kostrikov's implementation.

__init__(self, input_size = 4, output_size = 512, hidden_size = 3136) special

Arguments
  • input_size (int) - Number of channels. (Stacked frames in original implementation.)
  • output_size (int, optional, default=512) - Size of the output representation.
  • hidden_size (int, optional, default=1568) - Size of the representation after the convolutional layers

forward(self, input) inherited

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

cherry.models.atari.NatureActor

[Source]

Description

The actor head of the A3C architecture.

References
  1. Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
  2. Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”
Credit

Adapted from Ilya Kostrikov's implementation.

Arguments
  • input_size (int) - Size of input of the fully connected layers
  • output_size (int) - Size of the action space.

__init__(self, input_size, output_size) special

forward(self, input: Tensor) -> Tensor inherited

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

cherry.models.atari.NatureCritic

[Source]

Description

The critic head of the A3C architecture.

References
  1. Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
  2. Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”
Credit

Adapted from Ilya Kostrikov's implementation.

__init__(self, input_size, output_size = 1) special

Arguments
  • input_size (int) - Size of input of the fully connected layers
  • output_size (int, optional, default=1) - Size of the value.

forward(self, input: Tensor) -> Tensor inherited

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

cherry.models.robotics

cherry.models.robotics.RoboticsMLP

[Source]

Description

A multi-layer perceptron with proper initialization for robotic control.

Credit

Adapted from Ilya Kostrikov's implementation.

Example
target_qf = ch.models.robotics.RoboticsMLP(23,
                                         34,
                                         layer_sizes=[32, 32])

__init__(self, input_size, output_size, layer_sizes = None) special

Arguments
  • inputs_size (int) - Size of input.
  • output_size (int) - Size of output.
  • layer_sizes (list, optional, default=None) - A list of ints, each indicating the size of a hidden layer. (Defaults to two hidden layers of 64 units.)

forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

cherry.models.robotics.RoboticsActor

[Source]

Description

A multi-layer perceptron with initialization designed for choosing actions in continuous robotic environments.

Credit

Adapted from Ilya Kostrikov's implementation.

Example
policy_mean = ch.models.robotics.Actor(28,
                                      8,
                                      layer_sizes=[64, 32, 16])

__init__(self, input_size, output_size, layer_sizes = None) special

Arguments
  • inputs_size (int) - Size of input.
  • output_size (int) - Size of action size.
  • layer_sizes (list, optional, default=None) - A list of ints, each indicating the size of a hidden layer. (Defaults to two hidden layers of 64 units.)

forward(self, x) inherited

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

cherry.models.robotics.LinearValue

[Source]

Description

A linear state-value function, whose parameters are found by minimizing least-squares.

Credit

Adapted from Tristan Deleu's implementation.

References
  1. Duan et al. 2016. “Benchmarking Deep Reinforcement Learning for Continuous Control.”
  2. https://github.com/tristandeleu/pytorch-maml-rl
Example
states = replay.state()
rewards = replay.reward()
dones = replay.done()
returns = ch.td.discount(gamma, rewards, dones)
baseline = LinearValue(input_size)
baseline.fit(states, returns)
next_values = baseline(replay.next_states())

__init__(self, input_size, reg = 1e-05) special

Arguments
  • inputs_size (int) - Size of input.
  • reg (float, optional, default=1e-5) - Regularization coefficient.

fit(self, states, returns)

Description

Fits the parameters of the linear model by the method of least-squares.

Arguments
  • states (tensor) - States collected with the policy to evaluate.
  • returns (tensor) - Returns associated with those states (ie, discounted rewards).

forward(self, state)

Description

Computes the value of a state using the linear function approximator.

Arguments
  • state (Tensor) - The state to evaluate.

cherry.models.tabular

cherry.models.tabular.StateValueFunction

[Source]

Description

Stores a table of state values, V(s), one for each state.

Assumes that the states are one-hot encoded. Also, the returned values are differentiable and can be used in conjunction with PyTorch's optimizers.

References
  1. Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.
Example
vf = StateValueFunction(env.state_size)
state = env.reset()
state = ch.onehot(state, env.state_size)
state_value = vf(state)

__init__(self, state_size, init = None) special

Arguments
  • state_size (int) - The number of states in the environment.
  • init (function, optional, default=None) - The initialization scheme for the values in the table. (Default is 0.)

forward(self, state)

Description

Returns the state value of a one-hot encoded state.

Arguments
  • state (Tensor) - State to be evaluated.

cherry.models.tabular.ActionValueFunction

[Source]

Description

Stores a table of action values, Q(s, a), one for each (state, action) pair.

Assumes that the states and actions are one-hot encoded. Also, the returned values are differentiable and can be used in conjunction with PyTorch's optimizers.

References
  1. Richard Sutton and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.
Example
qf = ActionValueFunction(env.state_size, env.action_size)
state = env.reset()
state = ch.onehot(state, env.state_size)
all_action_values = qf(state)
action = ch.onehot(0, env.action_size)
action_value = qf(state, action)

__init__(self, state_size, action_size, init = None) special

Arguments
  • state_size (int) - The number of states in the environment.
  • action_size (int) - The number of actions per state.
  • init (function, optional, default=None) - The initialization scheme for the values in the table. (Default is 0.)

forward(self, state, action = None)

Description

Returns the action value of a one-hot encoded state and one-hot encoded action.

Arguments
  • state (Tensor) - State to be evaluated.
  • action (Tensor) - Action to be evaluated.