cherry¶

`cherry.experience_replay.Transition` ¶

Description¶

Represents a (s, a, r, s', d) tuple.

All attributes (including the ones in infos) are accessible via transition.name_of_attr. (e.g. transition.log_prob if log_prob is in infos.)

Example¶

for transition in replay:
    print(transition.state)

`init(self, state, action, reward, next_state, done, device = None, **infos)` `special` ¶

Arguments¶

state (tensor) - Originating state.
action (tensor) - Executed action.
reward (tensor) - Observed reward.
next_state (tensor) - Resulting state.
done (tensor) - Is next_state a terminal (absorbing) state ?
infos (dict, optional, default=None) - Additional information on the transition.

`to(self, *args, **kwargs)` ¶

Description

Moves the constituents of the transition to the desired device, and casts them to the desired format.

Note: This is done in-place and doesn't create a new transition.

Arguments

device (device, optional, default=None) - The device to move the data to.
dtype (dtype, optional, default=None) - The torch.dtype format to cast to.
non_blocking (bool, optional, default=False) - Whether to perform the move asynchronously.

Example

sars = Transition(state, action, reward, next_state)
sars.to('cuda')

`cherry.experience_replay.ExperienceReplay` ¶

[Source]

Description¶

Experience replay buffer to store, retrieve, and sample past transitions.

ExperienceReplay behaves like a list of transitions, . It also support accessing specific properties, such as states, actions, rewards, next_states, and informations. The first four are returned as tensors, while infos is returned as a list of dicts. The properties of infos can be accessed directly by appending an s to their dictionary key -- see Examples below. In this case, if the values of the infos are tensors, they will be returned as a concatenated Tensor. Otherwise, they default to a list of values.

References¶

Lin, Long-Ji. 1992. “Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching.” Machine Learning 8 (3): 293–321.

Example¶

replay = ch.ExperienceReplay()  # Instanciate a new replay
replay.append(state,  # Add experience to the replay
              action,
              reward,
              next_state,
              done,
              density: action_density,
              log_prob: action_density.log_prob(action),
              )

replay.state()  # Tensor of states
replay.action()  # Tensor of actions
replay.density()  # list of action_density
replay.log_prob()  # Tensor of log_probabilities

new_replay = replay[-10:]  # Last 10 transitions in new_replay

#Sample some previous experience
batch = replay.sample(32, contiguous=True)

`init(self, storage = None, device = None, vectorized = False)` `special` ¶

Arguments¶

storage (list, optional, default=None) - A list of Transitions.
device (torch.device, optional, default=None) - The device of the replay.
vectorized (bool, optional, default=False) - Whether the transitions are vectorized or not.

`append(self, state = None, action = None, reward = None, next_state = None, done = None, **infos)` ¶

Description¶

Appends new data to the list ExperienceReplay.

Arguments¶

state (tensor/ndarray/list) - Originating state.
action (tensor/ndarray/list) - Executed action.
reward (tensor/ndarray/list) - Observed reward.
next_state (tensor/ndarray/list) - Resulting state.
done (tensor/bool) - Is next_state a terminal (absorbing) state ?
infos` (dict, optional, default=None) - Additional information on the transition.

Example¶

replay.append(state, action, reward, next_state, done, info={
    'density': density,
    'log_prob': density.log_prob(action),
})

`empty(self)` ¶

Description¶

Removes all data from an ExperienceReplay.

Example¶

replay.empty()

`flatten(self)` ¶

Description¶

Returns a "flattened" version of the replay, where each transition only contains one timestep.

Assuming the original replay has N transitions each with M timesteps -- i.e. sars.state with shapes (M, state_size) -- this method returns a new replay with NM transitions (and the states have shape (*state_size)).

Note: This method breaks the timestep orders, so transitions are not consecutive anymore.

Note: No-op if not vectorized.

Example¶

flat_replay = replay.flatten()

`load(self, path)` ¶

Description¶

Loads data from a serialized ExperienceReplay.

Arguments¶

path (str) - File path of serialized ExperienceReplay.

Example¶

replay.load('my_replay_file.pt')

`sample(self, size = 1, contiguous = False, episodes = False, nsteps = 1, discount = 1.0)` ¶

Samples from the Experience replay.

Arguments¶

size (int, optional, default=1) - The number of samples.
contiguous (bool, optional, default=False) - Whether to sample contiguous transitions.
episodes (bool, optional, default=False) - Sample full episodes, instead of transitions.
nsteps (int, optional, default=1) - Steps to compute the n-steps returns.
discount (float, optional, default=1.0) - Discount for n-steps returns.

Returns¶

ExperienceReplay - New ExperienceReplay containing the sampled transitions.

`save(self, path)` ¶

Description¶

Serializes and saves the ExperienceReplay into the given path.

Arguments¶

path (str) - File path.

Example¶

replay.save('my_replay_file.pt')

`to(self, *args, **kwargs)` ¶

Description¶

Calls .to() on all transitions of the experience replay, moving them to the desired device and casting the to the desired format.

Note: This return a new experience replay, but the transitions are modified in-place.

Arguments¶

device (device, optional, default=None) - The device to move the data to.
dtype (dtype, optional, default=None) - The torch.dtype format to cast to.
non_blocking (bool, optional, default=False) - Whether to perform the move asynchronously.

Example¶

replay.to('cuda:1')
policy.to('cuda:1')
for sars in replay:
    cuda_action = policy(sars.state).sample()

`cherry._torch.totensor(array, dtype = None)` ¶

[Source]

Description

Converts the argument array to a torch.tensor 1xN, regardless of its type or dimension.

Arguments

array (int, float, ndarray, tensor) - Data to be converted to array.
dtype (dtype, optional, default=None) - Data type to use for representation. By default, uses torch.get_default_dtype().

Returns

Tensor of shape 1xN with the appropriate data type.

Example

array = [5, 6, 7.0]
tensor = cherry.totensor(array, dtype=th.float32)
array = np.array(array, dtype=np.float64)
tensor = cherry.totensor(array, dtype=th.float16)

`cherry._torch.normalize(tensor, epsilon = 1e-08)` ¶

[Source]

Description

Normalizes a tensor to have zero mean and unit standard deviation values.

Arguments

tensor (tensor) - The tensor to normalize.
epsilon (float, optional, default=1e-8) - Numerical stability constant for normalization.

Returns

A new tensor, containing the normalized values.

Example

tensor = torch.arange(23) / 255.0
tensor = cherry.normalize(tensor, epsilon=1e-3)

cherry¶

cherry.experience_replay.Transition ¶

Description¶

Example¶

__init__(self, state, action, reward, next_state, done, device = None, **infos) special ¶

Arguments¶

to(self, *args, **kwargs) ¶

cherry.experience_replay.ExperienceReplay ¶

Description¶

References¶

Example¶

__init__(self, storage = None, device = None, vectorized = False) special ¶

Arguments¶

append(self, state = None, action = None, reward = None, next_state = None, done = None, **infos) ¶

Description¶

Arguments¶

Example¶

empty(self) ¶

Description¶

Example¶

flatten(self) ¶

Description¶

Example¶

load(self, path) ¶

Description¶

Arguments¶

Example¶

sample(self, size = 1, contiguous = False, episodes = False, nsteps = 1, discount = 1.0) ¶

Arguments¶

Returns¶

save(self, path) ¶

Description¶

Arguments¶

Example¶

to(self, *args, **kwargs) ¶

Description¶

Arguments¶

Example¶

cherry._torch.totensor(array, dtype = None) ¶

cherry._torch.normalize(tensor, epsilon = 1e-08) ¶

`cherry.experience_replay.Transition` ¶

`init(self, state, action, reward, next_state, done, device = None, **infos)` `special` ¶

`to(self, *args, **kwargs)` ¶

`cherry.experience_replay.ExperienceReplay` ¶

`init(self, storage = None, device = None, vectorized = False)` `special` ¶

`append(self, state = None, action = None, reward = None, next_state = None, done = None, **infos)` ¶

`empty(self)` ¶

`flatten(self)` ¶

`load(self, path)` ¶

`sample(self, size = 1, contiguous = False, episodes = False, nsteps = 1, discount = 1.0)` ¶

`save(self, path)` ¶

`to(self, *args, **kwargs)` ¶

`cherry._torch.totensor(array, dtype = None)` ¶

`cherry._torch.normalize(tensor, epsilon = 1e-08)` ¶