cherry¶
cherry.experience_replay.Transition
¶
Description¶
Represents a (s, a, r, s', d) tuple.
All attributes (including the ones in infos) are accessible via
transition.name_of_attr
.
(e.g. transition.log_prob
if log_prob
is in infos
.)
Example¶
for transition in replay:
print(transition.state)
__init__(self, state, action, reward, next_state, done, device = None, **infos)
special
¶
Arguments¶
state
(tensor) - Originating state.action
(tensor) - Executed action.reward
(tensor) - Observed reward.next_state
(tensor) - Resulting state.done
(tensor) - Isnext_state
a terminal (absorbing) state ?infos
(dict, optional, default=None) - Additional information on the transition.
to(self, *args, **kwargs)
¶
Description
Moves the constituents of the transition to the desired device, and casts them to the desired format.
Note: This is done in-place and doesn't create a new transition.
Arguments
- device (device, optional, default=None) - The device to move the data to.
- dtype (dtype, optional, default=None) - The torch.dtype format to cast to.
- non_blocking (bool, optional, default=False) - Whether to perform the move asynchronously.
Example
sars = Transition(state, action, reward, next_state)
sars.to('cuda')
cherry.experience_replay.ExperienceReplay
¶
Description¶
Experience replay buffer to store, retrieve, and sample past transitions.
ExperienceReplay
behaves like a list of transitions, .
It also support accessing specific properties, such as states, actions,
rewards, next_states, and informations.
The first four are returned as tensors, while infos
is returned as
a list of dicts.
The properties of infos can be accessed directly by appending an s
to
their dictionary key -- see Examples below.
In this case, if the values of the infos are tensors, they will be returned
as a concatenated Tensor.
Otherwise, they default to a list of values.
References¶
- Lin, Long-Ji. 1992. “Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching.” Machine Learning 8 (3): 293–321.
Example¶
replay = ch.ExperienceReplay() # Instanciate a new replay
replay.append(state, # Add experience to the replay
action,
reward,
next_state,
done,
density: action_density,
log_prob: action_density.log_prob(action),
)
replay.state() # Tensor of states
replay.action() # Tensor of actions
replay.density() # list of action_density
replay.log_prob() # Tensor of log_probabilities
new_replay = replay[-10:] # Last 10 transitions in new_replay
#Sample some previous experience
batch = replay.sample(32, contiguous=True)
__init__(self, storage = None, device = None, vectorized = False)
special
¶
Arguments¶
storage
(list, optional, default=None) - A list of Transitions.device
(torch.device, optional, default=None) - The device of the replay.vectorized
(bool, optional, default=False) - Whether the transitions are vectorized or not.
append(self, state = None, action = None, reward = None, next_state = None, done = None, **infos)
¶
Description¶
Appends new data to the list ExperienceReplay.
Arguments¶
state
(tensor/ndarray/list) - Originating state.action
(tensor/ndarray/list) - Executed action.reward
(tensor/ndarray/list) - Observed reward.next_state
(tensor/ndarray/list) - Resulting state.done
(tensor/bool) - Isnext_state
a terminal (absorbing) state ?- infos` (dict, optional, default=None) - Additional information on the transition.
Example¶
replay.append(state, action, reward, next_state, done, info={
'density': density,
'log_prob': density.log_prob(action),
})
flatten(self)
¶
Description¶
Returns a "flattened" version of the replay, where each transition only contains one timestep.
Assuming the original replay has N transitions each with M timesteps -- i.e. sars.state with shapes (M, state_size) -- this method returns a new replay with NM transitions (and the states have shape (*state_size)).
Note: This method breaks the timestep orders, so transitions are not consecutive anymore.
Note: No-op if not vectorized.
Example¶
flat_replay = replay.flatten()
load(self, path)
¶
sample(self, size = 1, contiguous = False, episodes = False, nsteps = 1, discount = 1.0)
¶
Samples from the Experience replay.
Arguments¶
size
(int, optional, default=1) - The number of samples.contiguous
(bool, optional, default=False) - Whether to sample contiguous transitions.episodes
(bool, optional, default=False) - Sample full episodes, instead of transitions.nsteps
(int, optional, default=1) - Steps to compute the n-steps returns.discount
(float, optional, default=1.0) - Discount for n-steps returns.
Returns¶
ExperienceReplay
- New ExperienceReplay containing the sampled transitions.
save(self, path)
¶
to(self, *args, **kwargs)
¶
Description¶
Calls .to()
on all transitions of the experience replay, moving them to the
desired device and casting the to the desired format.
Note: This return a new experience replay, but the transitions are modified in-place.
Arguments¶
device
(device, optional, default=None) - The device to move the data to.dtype
(dtype, optional, default=None) - The torch.dtype format to cast to.non_blocking
(bool, optional, default=False) - Whether to perform the move asynchronously.
Example¶
replay.to('cuda:1')
policy.to('cuda:1')
for sars in replay:
cuda_action = policy(sars.state).sample()
cherry._torch.totensor(array, dtype = None)
¶
Description
Converts the argument array
to a torch.tensor 1xN, regardless of its
type or dimension.
Arguments
- array (int, float, ndarray, tensor) - Data to be converted to array.
- dtype (dtype, optional, default=None) - Data type to use for representation.
By default, uses
torch.get_default_dtype()
.
Returns
- Tensor of shape 1xN with the appropriate data type.
Example
array = [5, 6, 7.0]
tensor = cherry.totensor(array, dtype=th.float32)
array = np.array(array, dtype=np.float64)
tensor = cherry.totensor(array, dtype=th.float16)
cherry._torch.normalize(tensor, epsilon = 1e-08)
¶
Description
Normalizes a tensor to have zero mean and unit standard deviation values.
Arguments
- tensor (tensor) - The tensor to normalize.
- epsilon (float, optional, default=1e-8) - Numerical stability constant for normalization.
Returns
- A new tensor, containing the normalized values.
Example
tensor = torch.arange(23) / 255.0
tensor = cherry.normalize(tensor, epsilon=1e-3)