cherry

cherry.experience_replay.Transition

[Source]

Description

Represents a (s, a, r, s', d) tuple.

All attributes (including the ones in infos) are accessible via transition.name_of_attr. (e.g. transition.log_prob if log_prob is in infos.)

Example
for transition in replay:
    print(transition.state)

__init__(self, state, action, reward, next_state, done, device = None, **infos) special

Arguments
  • state (tensor) - Originating state.
  • action (tensor) - Executed action.
  • reward (tensor) - Observed reward.
  • next_state (tensor) - Resulting state.
  • done (tensor) - Is next_state a terminal (absorbing) state ?
  • infos (dict, optional, default=None) - Additional information on the transition.

to(self, *args, **kwargs)

Description

Moves the constituents of the transition to the desired device, and casts them to the desired format.

Note: This is done in-place and doesn't create a new transition.

Arguments

  • device (device, optional, default=None) - The device to move the data to.
  • dtype (dtype, optional, default=None) - The torch.dtype format to cast to.
  • non_blocking (bool, optional, default=False) - Whether to perform the move asynchronously.

Example

sars = Transition(state, action, reward, next_state)
sars.to('cuda')

cherry.experience_replay.ExperienceReplay

[Source]

Description

Experience replay buffer to store, retrieve, and sample past transitions.

ExperienceReplay behaves like a list of transitions, . It also support accessing specific properties, such as states, actions, rewards, next_states, and informations. The first four are returned as tensors, while infos is returned as a list of dicts. The properties of infos can be accessed directly by appending an s to their dictionary key -- see Examples below. In this case, if the values of the infos are tensors, they will be returned as a concatenated Tensor. Otherwise, they default to a list of values.

References
  1. Lin, Long-Ji. 1992. “Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching.” Machine Learning 8 (3): 293–321.
Example
replay = ch.ExperienceReplay()  # Instanciate a new replay
replay.append(state,  # Add experience to the replay
              action,
              reward,
              next_state,
              done,
              density: action_density,
              log_prob: action_density.log_prob(action),
              )

replay.state()  # Tensor of states
replay.action()  # Tensor of actions
replay.density()  # list of action_density
replay.log_prob()  # Tensor of log_probabilities

new_replay = replay[-10:]  # Last 10 transitions in new_replay

#Sample some previous experience
batch = replay.sample(32, contiguous=True)

__init__(self, storage = None, device = None, vectorized = False) special

Arguments
  • storage (list, optional, default=None) - A list of Transitions.
  • device (torch.device, optional, default=None) - The device of the replay.
  • vectorized (bool, optional, default=False) - Whether the transitions are vectorized or not.

append(self, state = None, action = None, reward = None, next_state = None, done = None, **infos)

Description

Appends new data to the list ExperienceReplay.

Arguments
  • state (tensor/ndarray/list) - Originating state.
  • action (tensor/ndarray/list) - Executed action.
  • reward (tensor/ndarray/list) - Observed reward.
  • next_state (tensor/ndarray/list) - Resulting state.
  • done (tensor/bool) - Is next_state a terminal (absorbing) state ?
  • infos` (dict, optional, default=None) - Additional information on the transition.
Example
replay.append(state, action, reward, next_state, done, info={
    'density': density,
    'log_prob': density.log_prob(action),
})

empty(self)

Description

Removes all data from an ExperienceReplay.

Example
replay.empty()

flatten(self)

Description

Returns a "flattened" version of the replay, where each transition only contains one timestep.

Assuming the original replay has N transitions each with M timesteps -- i.e. sars.state with shapes (M, state_size) -- this method returns a new replay with NM transitions (and the states have shape (*state_size)).

Note: This method breaks the timestep orders, so transitions are not consecutive anymore.

Note: No-op if not vectorized.

Example
flat_replay = replay.flatten()

load(self, path)

Description

Loads data from a serialized ExperienceReplay.

Arguments
  • path (str) - File path of serialized ExperienceReplay.
Example
replay.load('my_replay_file.pt')

sample(self, size = 1, contiguous = False, episodes = False, nsteps = 1, discount = 1.0)

Samples from the Experience replay.

Arguments
  • size (int, optional, default=1) - The number of samples.
  • contiguous (bool, optional, default=False) - Whether to sample contiguous transitions.
  • episodes (bool, optional, default=False) - Sample full episodes, instead of transitions.
  • nsteps (int, optional, default=1) - Steps to compute the n-steps returns.
  • discount (float, optional, default=1.0) - Discount for n-steps returns.
Returns
  • ExperienceReplay - New ExperienceReplay containing the sampled transitions.

save(self, path)

Description

Serializes and saves the ExperienceReplay into the given path.

Arguments
  • path (str) - File path.
Example
replay.save('my_replay_file.pt')

to(self, *args, **kwargs)

Description

Calls .to() on all transitions of the experience replay, moving them to the desired device and casting the to the desired format.

Note: This return a new experience replay, but the transitions are modified in-place.

Arguments
  • device (device, optional, default=None) - The device to move the data to.
  • dtype (dtype, optional, default=None) - The torch.dtype format to cast to.
  • non_blocking (bool, optional, default=False) - Whether to perform the move asynchronously.
Example
replay.to('cuda:1')
policy.to('cuda:1')
for sars in replay:
    cuda_action = policy(sars.state).sample()

cherry._torch.totensor(array, dtype = None)

[Source]

Description

Converts the argument array to a torch.tensor 1xN, regardless of its type or dimension.

Arguments

  • array (int, float, ndarray, tensor) - Data to be converted to array.
  • dtype (dtype, optional, default=None) - Data type to use for representation. By default, uses torch.get_default_dtype().

Returns

  • Tensor of shape 1xN with the appropriate data type.

Example

array = [5, 6, 7.0]
tensor = cherry.totensor(array, dtype=th.float32)
array = np.array(array, dtype=np.float64)
tensor = cherry.totensor(array, dtype=th.float16)

cherry._torch.normalize(tensor, epsilon = 1e-08)

[Source]

Description

Normalizes a tensor to have zero mean and unit standard deviation values.

Arguments

  • tensor (tensor) - The tensor to normalize.
  • epsilon (float, optional, default=1e-8) - Numerical stability constant for normalization.

Returns

  • A new tensor, containing the normalized values.

Example

tensor = torch.arange(23) / 255.0
tensor = cherry.normalize(tensor, epsilon=1e-3)