cherry.nn¶

`cherry.nn.policy.Policy` ¶

Abstract Module to represent policies.

Subclassing this module helps retain a unified API across codebases, and also automatically defines some helper functions (you only need that forward returns a Distribution instance).

Example¶

class RandomPolicy(Policy):

    def __init__(self, num_actions=5):
        self.num_actions = num_actions

    def forward(self, state):  # must return a density
        probs = torch.ones(self.num_actions) / self.num_actions
        density = cherry.distributions.Categorical(probs=probs)
        return density

# We can now use some predefined functions:
random_policy = RandomPolicy()
actions = random_policy.act(states, deterministic=True)
log_probs = random_policy.log_probs(states, actions)

`init(self) -> None` `inherited` `special` ¶

`act(self, state, deterministic = False)` ¶

Description¶

Given a state, samples an action from the policy.

If deterministic=True, the action is the model of the policy distribution.

Arguments¶

state (Tensor) - State to take an action in.
deterministic (bool, optional, default=False) - Where the action is sampled (False) or the mode of the policy (True).

`forward(self, state)` ¶

Description¶

Should return a Distribution instance correspoinding to the policy density for state.

Arguments¶

state (Tensor) - State where the policy should be computed.

`log_prob(self, state, action)` ¶

Description¶

Computes the log probability of action given state, according to the policy.

Arguments¶

state (Tensor) - A tensor of states.
action (Tensor) - The actions of which to compute the log probability.

`cherry.nn.action_value.ActionValue` ¶

[Source]

Description¶

Abstract Module to represent Q-value functions.

Example¶

class QValue(ActionValue):

    def __init__(self, state_size, action_size):
        super(QValue, self).__init__()
        self.mlp = MLP(state_size+action_size, 1, [1024, 1024])

    def forward(self, state, action):
        return self.mlp(torch.cat([state, action], dim=1))

qf = QValue(128, 5)
qvalue = qf(state, action)

`forward(self, state, action = None)` ¶

Description¶

Returns the scalar value for taking action action in state state.

If action is not given, should return the value for all actions (useful for DQN-like architectures).

Arguments¶

state (Tensor) - State to be evaluated.
action (Tensor, optional, default=None) - Action to be evaluated.

Returns¶

value (Tensor) - Value of taking action in state. Shape: (batch_size, 1)

`cherry.nn.action_value.Twin` ¶

[Source]

Description¶

Helper class to implement Twin action-value functions as described in [1].

References¶

Fujimoto et al., "Addressing Function Approximation Error in Actor-Critic Methods". ICML 2018.

Example¶

qvalue = Twin(QValue(), QValue())
values = qvalue(states, actions)
values1, values1 = qvalue.twin(states, actions)

`init(self, *action_values)` `special` ¶

Arguments¶

qvalue1, qvalue2, ... (ActionValue) - Action value functions.

`forward(self, state, action)` ¶

Description¶

Returns the minimum value computed by the individual value functions wrapped by this class.

Arguments¶

state (Tensor) - The state to evaluate.
action (Tensor) - The action to evaluate.

`twin(self, state, action)` ¶

Description¶

Returns the values of each individual value function wrapped by this class.

Arguments¶

state (Tensor) - State to be evaluated.
action (Tensor) - Action to be evaluated.

`cherry.nn.robotics_layers.RoboticsLinear` ¶

[Source]

Description¶

Akin to nn.Linear, but with proper initialization for robotic control.

Credit¶

Adapted from Ilya Kostrikov's implementation.

Example¶

linear = ch.nn.Linear(23, 5, bias=True)
action_mean = linear(state)

`init(self, *args, **kwargs)` `special` ¶

Arguments¶

gain (float, optional) - Gain factor passed to robotics_init_ initialization.
This class extends nn.Linear and supports all of its arguments.

`cherry.nn.epsilon_greedy.EpsilonGreedy` ¶

[Source]

Description¶

Samples actions from a uniform distribution with probability epsilon or the one maximizing the input with probability 1 - epsilon.

References¶

Sutton, Richard, and Andrew Barto. 2018. Reinforcement Learning, Second Edition. The MIT Press.

Example¶

egreedy = EpsilonGreedy()
q_values = q_value(state)  # NxM tensor
actions = egreedy(q_values)  # Nx1 tensor of longs

`init(self, epsilon = 0.05, learnable = False)` `special` ¶

Arguments¶

epsilon (float, optional, default=0.05) - The epsilon factor.
learnable (bool, optional, default=False) - Whether the epsilon factor is a learnable parameter or not.

`cherry.nn.mlp.MLP` ¶

[Source]

Description¶

Implements a simple multi-layer perceptron.

Example¶

net = MLP(128, 1, [1024, 1024], activation=torch.nn.GELU)

`init(self, input_size, output_size, hidden_sizes, activation = None, bias = True)` `special` ¶

Arguments¶

input_size (int) - Input size of the MLP.
output_size (int) - Number of output units.
hidden_sizes (list of int) - Each int is the number of hidden units of a layer.
activation (callable) - Activation function to use for the MLP.
bias (bool, optional, default=True) - Whether the MLP uses bias terms.

`cherry.nn.misc.Lambda` ¶

[Source]

Description¶

Turns any function into a PyTorch Module.

Example¶

double = Lambda(lambda x: 2 * x)
out = double(tensor([23]))  # out == 46

`init(self, fn)` `special` ¶

Description¶

fn (callable) - Function to turn into a Module.

cherry.nn¶

cherry.nn.policy.Policy ¶

Example¶

__init__(self) -> None inherited special ¶

act(self, state, deterministic = False) ¶

Description¶

Arguments¶

forward(self, state) ¶

Description¶

Arguments¶

log_prob(self, state, action) ¶

Description¶

Arguments¶

cherry.nn.action_value.ActionValue ¶

Description¶

Example¶

forward(self, state, action = None) ¶

Description¶

Arguments¶

Returns¶

cherry.nn.action_value.Twin ¶

Description¶

References¶

Example¶

__init__(self, *action_values) special ¶

Arguments¶

forward(self, state, action) ¶

Description¶

Arguments¶

twin(self, state, action) ¶

Description¶

Arguments¶

cherry.nn.robotics_layers.RoboticsLinear ¶

Description¶

Credit¶

Example¶

__init__(self, *args, **kwargs) special ¶

Arguments¶

cherry.nn.epsilon_greedy.EpsilonGreedy ¶

Description¶

References¶

Example¶

__init__(self, epsilon = 0.05, learnable = False) special ¶

Arguments¶

cherry.nn.mlp.MLP ¶

Description¶

Example¶

__init__(self, input_size, output_size, hidden_sizes, activation = None, bias = True) special ¶

Arguments¶

cherry.nn.misc.Lambda ¶

Description¶

Example¶

__init__(self, fn) special ¶

Description¶

`cherry.nn.policy.Policy` ¶

`init(self) -> None` `inherited` `special` ¶

`act(self, state, deterministic = False)` ¶

`forward(self, state)` ¶

`log_prob(self, state, action)` ¶

`cherry.nn.action_value.ActionValue` ¶

`forward(self, state, action = None)` ¶

`cherry.nn.action_value.Twin` ¶

`init(self, *action_values)` `special` ¶

`forward(self, state, action)` ¶

`twin(self, state, action)` ¶

`cherry.nn.robotics_layers.RoboticsLinear` ¶

`init(self, *args, **kwargs)` `special` ¶

`cherry.nn.epsilon_greedy.EpsilonGreedy` ¶

`init(self, epsilon = 0.05, learnable = False)` `special` ¶

`cherry.nn.mlp.MLP` ¶

`init(self, input_size, output_size, hidden_sizes, activation = None, bias = True)` `special` ¶

`cherry.nn.misc.Lambda` ¶

`init(self, fn)` `special` ¶