cherry.envs.utils

Description

Helper functions for OpenAI Gym environments.

is_discrete

is_discrete(space, vectorized=False)

Returns whether a space is discrete.

Arguments

get_space_dimension

get_space_dimension(space, vectorized=False)

Returns the number of elements of a space sample, when unrolled.

Arguments

Wrapper

Wrapper(env)

This class allows to chain Environment Wrappers while still being able to access the properties of wrapped wrappers.

Example:

env = gym.make('MyEnv-v0')
env = envs.Logger(env)
env = envs.Runner(env)
env.log('asdf', 23)  # Uses log() method from envs.Logger.

Runner

Runner(env)

Runner wrapper.

TODO: When is_vectorized and using episodes=n, use the parallel environmnents to sample n episodes, and stack them inside a flat replay.

run

Runner.run(get_action, steps=None, episodes=None, render=False)

Runner wrapper's run method.

Logger

Logger(env, interval=1000, episode_interval=10, title=None, logger=None)

Tracks and prints some common statistics about the environment.

Recorder

Recorder(env, directory='./videos/', format='gif', suffix=None)

[Source]

Description

Wrapper to record episodes from a rollout. Supports GIF and MP4 encoding.

Arguments

Credit

Adapted from OpenAI Gym's Monitor wrapper.

Example

env = gym.make('CartPole-v0')
env = envs.Recorder(record_env, './videos/', format='gif')
env = envs.Runner(env)
env.run(get_action, episodes=3, render=True)

close

Recorder.close()

Flush all monitor data to disk and close any open rending windows

VisdomLogger

VisdomLogger(env,
             interval=1000,
             episode_interval=10,
             render=True,
             title=None,
             logger=None)

Enables logging and debug values to Visdom.

Arguments

Torch

Torch(env)

This wrapper converts * actions from Tensors to numpy, * states from lists/numpy to Tensors.

Example: action = Categorical(Tensor([1, 2, 3])).sample() env.step(action)

Normalizer

Normalizer(env,
           states=True,
           rewards=True,
           clip_states=10.0,
           clip_rewards=10.0,
           gamma=0.99,
           eps=1e-08)

[Source]

Description

Normalizes the states and rewards with a running average.

Arguments

Credit

Adapted from OpenAI's baselines implementation.

Example

env = gym.make('CartPole-v0')
env = cherry.envs.Normalizer(env,
                             states=True,
                             rewards=False)

StateNormalizer

StateNormalizer(env, statistics=None, beta=0.99, eps=1e-08)

[Source]

Description

Normalizes the states with a running average.

Arguments

Credit

Adapted from Tristan Deleu's implementation.

Example

env = gym.make('CartPole-v0')
env = cherry.envs.StateNormalizer(env)
env2 = gym.make('CartPole-v0')
env2 = cherry.envs.StateNormalizer(env2,
                                   statistics=env.statistics)

RewardNormalizer

RewardNormalizer(env, statistics=None, beta=0.99, eps=1e-08)

[Source]

Description

Normalizes the rewards with a running average.

Arguments

Credit

Adapted from Tristan Deleu's implementation.

Example

env = gym.make('CartPole-v0')
env = cherry.envs.RewardNormalizer(env)
env2 = gym.make('CartPole-v0')
env2 = cherry.envs.RewardNormalizer(env2,
                                   statistics=env.statistics)

RewardClipper

RewardClipper(env)

reward

RewardClipper.reward(reward)

Bin reward to {+1, 0, -1} by its sign.

Monitor

Monitor(env, directory, *args, **kwargs)

Sugar coating on top of Gym's Monitor.

OpenAIAtari

OpenAIAtari(env)

AddTimestep

AddTimestep(env=None)

Adds a timestep information to the state input.

Modified from Ilya Kostrikov's implementation:

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr/

StateLambda

StateLambda(env, fn)

ActionLambda

ActionLambda(env, fn)

ActionSpaceScaler

ActionSpaceScaler(env, clip=1.0)

Scales the action space to be in the range (-clip, clip).

Adapted from Vitchyr Pong's RLkit: https://github.com/vitchyr/rlkit/blob/master/rlkit/envs/wrappers.py#L41