cherry.models¶
cherry.models.utils.polyak_average(source, target, alpha)
¶
Description¶
Shifts the parameters of source towards those of target.
Note: the parameter alpha
indicates the convex combination weight of the source.
(i.e. the old parameters are kept at a rate of alpha
.)
References¶
- Polyak, B., and A. Juditsky. 1992. “Acceleration of Stochastic Approximation by Averaging.”
Arguments¶
source
(Module) - The module to be shifted.target
(Module) - The module indicating the shift direction.alpha
(float) - Strength of the shift.
Example¶
target_qf = nn.Linear(23, 34)
qf = nn.Linear(23, 34)
ch.models.polyak_average(target_qf, qf, alpha=0.9)
cherry.models.utils.RandomPolicy
¶
cherry.models.atari¶
cherry.models.atari.NatureFeatures
¶
Description¶
The convolutional body of the DQN architecture.
References¶
- Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
- Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”
Credit¶
Adapted from Ilya Kostrikov's implementation.
__init__(self, input_size = 4, output_size = 512, hidden_size = 3136)
special
¶
Arguments¶
input_size
(int) - Number of channels. (Stacked frames in original implementation.)output_size
(int, optional, default=512) - Size of the output representation.hidden_size
(int, optional, default=1568) - Size of the representation after the convolutional layers
forward(self, input)
inherited
¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
cherry.models.atari.NatureActor
¶
Description¶
The actor head of the A3C architecture.
References¶
- Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
- Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”
Credit¶
Adapted from Ilya Kostrikov's implementation.
Arguments¶
input_size
(int) - Size of input of the fully connected layersoutput_size
(int) - Size of the action space.
__init__(self, input_size, output_size)
special
¶
forward(self, input: Tensor) -> Tensor
inherited
¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
cherry.models.atari.NatureCritic
¶
Description¶
The critic head of the A3C architecture.
References¶
- Mnih et al. 2015. “Human-Level Control through Deep Reinforcement Learning.”
- Mnih et al. 2016. “Asynchronous Methods for Deep Reinforcement Learning.”
Credit¶
Adapted from Ilya Kostrikov's implementation.
__init__(self, input_size, output_size = 1)
special
¶
Arguments¶
input_size
(int) - Size of input of the fully connected layersoutput_size
(int, optional, default=1) - Size of the value.
forward(self, input: Tensor) -> Tensor
inherited
¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
cherry.models.robotics¶
cherry.models.robotics.RoboticsMLP
¶
Description¶
A multi-layer perceptron with proper initialization for robotic control.
Credit¶
Adapted from Ilya Kostrikov's implementation.
Example¶
target_qf = ch.models.robotics.RoboticsMLP(23,
34,
layer_sizes=[32, 32])
__init__(self, input_size, output_size, layer_sizes = None)
special
¶
Arguments¶
inputs_size
(int) - Size of input.output_size
(int) - Size of output.layer_sizes
(list, optional, default=None) - A list of ints, each indicating the size of a hidden layer. (Defaults to two hidden layers of 64 units.)
forward(self, x)
¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
cherry.models.robotics.RoboticsActor
¶
Description¶
A multi-layer perceptron with initialization designed for choosing actions in continuous robotic environments.
Credit¶
Adapted from Ilya Kostrikov's implementation.
Example¶
policy_mean = ch.models.robotics.Actor(28,
8,
layer_sizes=[64, 32, 16])
__init__(self, input_size, output_size, layer_sizes = None)
special
¶
Arguments¶
inputs_size
(int) - Size of input.output_size
(int) - Size of action size.layer_sizes
(list, optional, default=None) - A list of ints, each indicating the size of a hidden layer. (Defaults to two hidden layers of 64 units.)
forward(self, x)
inherited
¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
cherry.models.robotics.LinearValue
¶
Description¶
A linear state-value function, whose parameters are found by minimizing least-squares.
Credit¶
Adapted from Tristan Deleu's implementation.
References¶
- Duan et al. 2016. “Benchmarking Deep Reinforcement Learning for Continuous Control.”
- https://github.com/tristandeleu/pytorch-maml-rl
Example¶
states = replay.state()
rewards = replay.reward()
dones = replay.done()
returns = ch.td.discount(gamma, rewards, dones)
baseline = LinearValue(input_size)
baseline.fit(states, returns)
next_values = baseline(replay.next_states())
cherry.models.tabular¶
cherry.models.tabular.StateValueFunction
¶
Description¶
Stores a table of state values, V(s), one for each state.
Assumes that the states are one-hot encoded. Also, the returned values are differentiable and can be used in conjunction with PyTorch's optimizers.
References¶
- Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.
Example¶
vf = StateValueFunction(env.state_size)
state = env.reset()
state = ch.onehot(state, env.state_size)
state_value = vf(state)
cherry.models.tabular.ActionValueFunction
¶
Description¶
Stores a table of action values, Q(s, a), one for each (state, action) pair.
Assumes that the states and actions are one-hot encoded. Also, the returned values are differentiable and can be used in conjunction with PyTorch's optimizers.
References¶
- Richard Sutton and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.
Example¶
qf = ActionValueFunction(env.state_size, env.action_size)
state = env.reset()
state = ch.onehot(state, env.state_size)
all_action_values = qf(state)
action = ch.onehot(0, env.action_size)
action_value = qf(state, action)