cherry.nn¶
cherry.nn.policy.Policy
¶
Abstract Module to represent policies.
Subclassing this module helps retain a unified API across codebases,
and also automatically defines some helper functions
(you only need that forward
returns a Distribution
instance).
Example¶
class RandomPolicy(Policy):
def __init__(self, num_actions=5):
self.num_actions = num_actions
def forward(self, state): # must return a density
probs = torch.ones(self.num_actions) / self.num_actions
density = cherry.distributions.Categorical(probs=probs)
return density
# We can now use some predefined functions:
random_policy = RandomPolicy()
actions = random_policy.act(states, deterministic=True)
log_probs = random_policy.log_probs(states, actions)
__init__(self) -> None
inherited
special
¶
act(self, state, deterministic = False)
¶
Description¶
Given a state, samples an action from the policy.
If deterministic=True
, the action is the model of the policy distribution.
Arguments¶
state
(Tensor) - State to take an action in.deterministic
(bool, optional, default=False) - Where the action is sampled (False
) or the mode of the policy (True
).
forward(self, state)
¶
cherry.nn.action_value.ActionValue
¶
Description¶
Abstract Module to represent Q-value functions.
Example¶
class QValue(ActionValue):
def __init__(self, state_size, action_size):
super(QValue, self).__init__()
self.mlp = MLP(state_size+action_size, 1, [1024, 1024])
def forward(self, state, action):
return self.mlp(torch.cat([state, action], dim=1))
qf = QValue(128, 5)
qvalue = qf(state, action)
forward(self, state, action = None)
¶
Description¶
Returns the scalar value for taking action action
in state state
.
If action
is not given, should return the value for all actions (useful for DQN-like architectures).
Arguments¶
state
(Tensor) - State to be evaluated.action
(Tensor, optional, default=None) - Action to be evaluated.
Returns¶
value
(Tensor) - Value of takingaction
instate
. Shape: (batch_size, 1)
cherry.nn.action_value.Twin
¶
Description¶
Helper class to implement Twin action-value functions as described in [1].
References¶
- Fujimoto et al., "Addressing Function Approximation Error in Actor-Critic Methods". ICML 2018.
Example¶
qvalue = Twin(QValue(), QValue())
values = qvalue(states, actions)
values1, values1 = qvalue.twin(states, actions)
cherry.nn.robotics_layers.RoboticsLinear
¶
cherry.nn.epsilon_greedy.EpsilonGreedy
¶
Description¶
Samples actions from a uniform distribution with probability epsilon
or
the one maximizing the input with probability 1 - epsilon
.
References¶
- Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.
Example¶
egreedy = EpsilonGreedy()
q_values = q_value(state) # NxM tensor
actions = egreedy(q_values) # Nx1 tensor of longs
cherry.nn.mlp.MLP
¶
Description¶
Implements a simple multi-layer perceptron.
Example¶
net = MLP(128, 1, [1024, 1024], activation=torch.nn.GELU)
__init__(self, input_size, output_size, hidden_sizes, activation = None, bias = True)
special
¶
Arguments¶
input_size
(int) - Input size of the MLP.output_size
(int) - Number of output units.hidden_sizes
(list of int) - Each int is the number of hidden units of a layer.activation
(callable) - Activation function to use for the MLP.bias
(bool, optional, default=True) - Whether the MLP uses bias terms.