Transition

Transition(state, action, reward, next_state, done, device=None, **infos)

Description

Represents a (s, a, r, s', d) tuple.

All attributes (including the ones in infos) are accessible via transition.name_of_attr. (e.g. transition.log_prob if log_prob is in infos.)

Arguments

Example

for transition in replay:
    print(transition.state)

to

Transition.to(*args, **kwargs)

Description

Moves the constituents of the transition to the desired device, and casts them to the desired format.

Note: This is done in-place and doesn't create a new transition.

Arguments

Example

sars = Transition(state, action, reward, next_state)
sars.to('cuda')

ExperienceReplay

ExperienceReplay(storage=None, device=None)

[Source]

Description

Experience replay buffer to store, retrieve, and sample past transitions.

ExperienceReplay behaves like a list of transitions, . It also support accessing specific properties, such as states, actions, rewards, next_states, and informations. The first four are returned as tensors, while infos is returned as a list of dicts. The properties of infos can be accessed directly by appending an s to their dictionary key -- see Examples below. In this case, if the values of the infos are tensors, they will be returned as a concatenated Tensor. Otherwise, they default to a list of values.

Arguments

References

  1. Lin, Long-Ji. 1992. “Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching.” Machine Learning 8 (3): 293–321.

Example

replay = ch.ExperienceReplay()  # Instanciate a new replay
replay.append(state,  # Add experience to the replay
              action,
              reward,
              next_state,
              done,
              density: action_density,
              log_prob: action_density.log_prob(action),
              )

replay.state()  # Tensor of states
replay.action()  # Tensor of actions
replay.density()  # list of action_density
replay.log_prob()  # Tensor of log_probabilities

new_replay = replay[-10:]  # Last 10 transitions in new_replay

#Sample some previous experience
batch = replay.sample(32, contiguous=True)

save

ExperienceReplay.save(path)

Description

Serializes and saves the ExperienceReplay into the given path.

Arguments

Example

replay.save('my_replay_file.pt')

load

ExperienceReplay.load(path)

Description

Loads data from a serialized ExperienceReplay.

Arguments

Example

replay.load('my_replay_file.pt')

append

ExperienceReplay.append(state=None,
                        action=None,
                        reward=None,
                        next_state=None,
                        done=None,
                        **infos)

Description

Appends new data to the list ExperienceReplay.

Arguments

Example

replay.append(state, action, reward, next_state, done, info={
    'density': density,
    'log_prob': density.log_prob(action),
})

sample

ExperienceReplay.sample(size=1, contiguous=False, episodes=False)

Samples from the Experience replay.

Arguments

Return

empty

ExperienceReplay.empty()

Description

Removes all data from an ExperienceReplay.

Example

replay.empty()

to

ExperienceReplay.to(*args, **kwargs)

Description

Calls .to() on all transitions of the experience replay, moving them to the desired device and casting the to the desired format.

Note: This return a new experience replay, but the transitions are modified in-place.

Arguments

Example

replay.to('cuda:1')
policy.to('cuda:1')
for sars in replay:
    cuda_action = policy(sars.state).sample()

totensor

totensor(array, dtype=None)

[Source]

Description

Converts the argument array to a torch.tensor 1xN, regardless of its type or dimension.

Arguments

Returns

Example

array = [5, 6, 7.0]
tensor = cherry.totensor(array, dtype=th.float32)
array = np.array(array, dtype=np.float64)
tensor = cherry.totensor(array, dtype=th.float16)

normalize

normalize(tensor, epsilon=1e-08)

[Source]

Description

Normalizes a tensor to have zero mean and unit standard deviation values.

Arguments

Returns

Example

tensor = torch.arange(23) / 255.0
tensor = cherry.normalize(tensor, epsilon=1e-3)