cherry.models¶

cherry.models.utils.polyak_average(source, target, alpha)¶

[Source]

Description¶

Shifts the parameters of source towards those of target.

Note: the parameter alpha indicates the convex combination weight of the source. (i.e. the old parameters are kept at a rate of alpha.)

References¶
1. Polyak, B., and A. Juditsky. 1992. “Acceleration of Stochastic Approximation by Averaging.”
Arguments¶
• source (Module) - The module to be shifted.
• target (Module) - The module indicating the shift direction.
• alpha (float) - Strength of the shift.
Example¶
target_qf = nn.Linear(23, 34)
qf = nn.Linear(23, 34)
ch.models.polyak_average(target_qf, qf, alpha=0.9)


 cherry.models.utils.RandomPolicy (Module) ¶

[Source]

Description¶

Policy that randomly samples actions from the environment action space.

Example¶
policy = ch.models.RandomPolicy(env)
env = envs.Runner(env)
replay = env.run(policy, steps=2048)


__init__(self, env, *args, **kwargs) special ¶

Arguments¶
• env (Environment) - Environment from which to sample actions.

cherry.models.atari¶

 cherry.models.atari.NatureFeatures (Sequential) ¶

[Source]

Description¶

The convolutional body of the DQN architecture.

References¶
1. Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
2. Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”

__init__(self, input_size = 4, output_size = 512, hidden_size = 3136) special ¶

Arguments¶
• input_size (int) - Number of channels. (Stacked frames in original implementation.)
• output_size (int, optional, default=512) - Size of the output representation.
• hidden_size (int, optional, default=1568) - Size of the representation after the convolutional layers

forward(self, input) inherited ¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

 cherry.models.atari.NatureActor (Linear) ¶

[Source]

Description¶

The actor head of the A3C architecture.

References¶
1. Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
2. Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”

Arguments¶
• input_size (int) - Size of input of the fully connected layers
• output_size (int) - Size of the action space.

forward(self, input: Tensor) -> Tensor inherited ¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

 cherry.models.atari.NatureCritic (Linear) ¶

[Source]

Description¶

The critic head of the A3C architecture.

References¶
1. Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
2. Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”

__init__(self, input_size, output_size = 1) special ¶

Arguments¶
• input_size (int) - Size of input of the fully connected layers
• output_size (int, optional, default=1) - Size of the value.

forward(self, input: Tensor) -> Tensor inherited ¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

cherry.models.robotics¶

 cherry.models.robotics.RoboticsMLP (Module) ¶

[Source]

Description¶

A multi-layer perceptron with proper initialization for robotic control.

Example¶
target_qf = ch.models.robotics.RoboticsMLP(23,
34,
layer_sizes=[32, 32])


__init__(self, input_size, output_size, layer_sizes = None) special ¶

Arguments¶
• inputs_size (int) - Size of input.
• output_size (int) - Size of output.
• layer_sizes (list, optional, default=None) - A list of ints, each indicating the size of a hidden layer. (Defaults to two hidden layers of 64 units.)

forward(self, x)¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

 cherry.models.robotics.RoboticsActor (RoboticsMLP) ¶

[Source]

Description¶

A multi-layer perceptron with initialization designed for choosing actions in continuous robotic environments.

Example¶
policy_mean = ch.models.robotics.Actor(28,
8,
layer_sizes=[64, 32, 16])


__init__(self, input_size, output_size, layer_sizes = None) special ¶

Arguments¶
• inputs_size (int) - Size of input.
• output_size (int) - Size of action size.
• layer_sizes (list, optional, default=None) - A list of ints, each indicating the size of a hidden layer. (Defaults to two hidden layers of 64 units.)

forward(self, x) inherited ¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

 cherry.models.robotics.LinearValue (Module) ¶

[Source]

Description¶

A linear state-value function, whose parameters are found by minimizing least-squares.

References¶
1. Duan et al. 2016. “Benchmarking Deep Reinforcement Learning for Continuous Control.”
2. https://github.com/tristandeleu/pytorch-maml-rl
Example¶
states = replay.state()
rewards = replay.reward()
dones = replay.done()
returns = ch.td.discount(gamma, rewards, dones)
baseline = LinearValue(input_size)
baseline.fit(states, returns)
next_values = baseline(replay.next_states())


__init__(self, input_size, reg = 1e-05) special ¶

Arguments¶
• inputs_size (int) - Size of input.
• reg (float, optional, default=1e-5) - Regularization coefficient.

fit(self, states, returns)¶

Description¶

Fits the parameters of the linear model by the method of least-squares.

Arguments¶
• states (tensor) - States collected with the policy to evaluate.
• returns (tensor) - Returns associated with those states (ie, discounted rewards).

forward(self, state)¶

Description¶

Computes the value of a state using the linear function approximator.

Arguments¶
• state (Tensor) - The state to evaluate.

cherry.models.tabular¶

 cherry.models.tabular.StateValueFunction (Module) ¶

[Source]

Description¶

Stores a table of state values, V(s), one for each state.

Assumes that the states are one-hot encoded. Also, the returned values are differentiable and can be used in conjunction with PyTorch's optimizers.

References¶
1. Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.
Example¶
vf = StateValueFunction(env.state_size)
state = env.reset()
state = ch.onehot(state, env.state_size)
state_value = vf(state)


__init__(self, state_size, init = None) special ¶

Arguments¶
• state_size (int) - The number of states in the environment.
• init (function, optional, default=None) - The initialization scheme for the values in the table. (Default is 0.)

forward(self, state)¶

Description¶

Returns the state value of a one-hot encoded state.

Arguments¶
• state (Tensor) - State to be evaluated.

 cherry.models.tabular.ActionValueFunction (ActionValue) ¶

[Source]

Description¶

Stores a table of action values, Q(s, a), one for each (state, action) pair.

Assumes that the states and actions are one-hot encoded. Also, the returned values are differentiable and can be used in conjunction with PyTorch's optimizers.

References¶
1. Richard Sutton and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.
Example¶
qf = ActionValueFunction(env.state_size, env.action_size)
state = env.reset()
state = ch.onehot(state, env.state_size)
all_action_values = qf(state)
action = ch.onehot(0, env.action_size)
action_value = qf(state, action)


__init__(self, state_size, action_size, init = None) special ¶

Arguments¶
• state_size (int) - The number of states in the environment.
• action_size (int) - The number of actions per state.
• init (function, optional, default=None) - The initialization scheme for the values in the table. (Default is 0.)

forward(self, state, action = None)¶

Description¶

Returns the action value of a one-hot encoded state and one-hot encoded action.

Arguments¶
• state (Tensor) - State to be evaluated.
• action (Tensor) - Action to be evaluated.