actorcritic.envs.atari.wrappers¶

Contains wrappers that can wrap around environments to modify their functionality.

The implementations of these wrappers are adopted from OpenAI.

Classes

`AtariClipRewardWrapper`(env)	A wrapper that clips the rewards between -1 and 1.
`AtariEpisodicLifeWrapper`(env)	A wrapper that ends episodes (returns terminal = True) after a life in the Atari game has been lost.
`AtariFireResetWrapper`(env)	A wrapper that executes the ‘FIRE’ action after the environment has been reset.
`AtariFrameskipWrapper`(env, frameskip)	A wrapper that skips frames.
`AtariInfoClearWrapper`(env)	A wrapper that removes unnecessary data in the info returned by `gym.Env.step()`.
`AtariNoopResetWrapper`(env, noop_max)	A wrapper that executes a random number of ‘NOOP’ actions.
`AtariPreprocessFrameWrapper`(env)	A wrapper that scales the observations from 210x160 down to 84x84 and converts from RGB to grayscale by extracting the luminance.
`EpisodeInfoWrapper`(env)	A wrapper that stores episode information in the info returned by `gym.Env.step()` at the end of an episode.
`FrameStackWrapper`(env, num_stacked_frames)	A wrapper that stacks the last observations.
`RenderWrapper`(env[, fps])	A wrapper that calls `gym.Env.render()` every step.

class actorcritic.envs.atari.wrappers.AtariClipRewardWrapper(env)[source]¶

Bases: gym.core.RewardWrapper

A wrapper that clips the rewards between -1 and 1.

__init__(env)[source]¶

Parameters:	env (`gym.Env`) – An environment that will be wrapped.

class actorcritic.envs.atari.wrappers.AtariEpisodicLifeWrapper(env)[source]¶

Bases: gym.core.Wrapper

A wrapper that ends episodes (returns terminal = True) after a life in the Atari game has been lost.

__init__(env)[source]¶

Parameters:	env (`gym.Env`) – An environment that will be wrapped.

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns: observation (object): the initial observation of the: space.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:	action (object) – an action provided by the environment
Returns:	observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class actorcritic.envs.atari.wrappers.AtariFireResetWrapper(env)[source]¶

Bases: gym.core.Wrapper

A wrapper that executes the ‘FIRE’ action after the environment has been reset.

__init__(env)[source]¶

Parameters:	env (`gym.Env`) – An environment that will be wrapped.

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns: observation (object): the initial observation of the: space.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:	action (object) – an action provided by the environment
Returns:	observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class actorcritic.envs.atari.wrappers.AtariFrameskipWrapper(env, frameskip)[source]¶

Bases: gym.core.Wrapper

A wrapper that skips frames.

__init__(env, frameskip)[source]¶

Parameters:	env (`gym.Env`) – An environment that will be wrapped. frameskip (`int`) – Every frameskip-th frame is used. The remaining frames are skipped.

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns: observation (object): the initial observation of the: space.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:	action (object) – an action provided by the environment
Returns:	observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class actorcritic.envs.atari.wrappers.AtariInfoClearWrapper(env)[source]¶

Bases: gym.core.Wrapper

A wrapper that removes unnecessary data in the info returned by gym.Env.step(). This reduces the amount of inter-process data.

Warning

AtariEpisodicLifeWrapper does not work afterwards, so it should be used before.

__init__(env)[source]¶

Parameters:	env (`gym.Env`) – An environment that will be wrapped.

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns: observation (object): the initial observation of the: space.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:	action (object) – an action provided by the environment
Returns:	observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class actorcritic.envs.atari.wrappers.AtariNoopResetWrapper(env, noop_max)[source]¶

Bases: gym.core.Wrapper

A wrapper that executes a random number of ‘NOOP’ actions.

__init__(env, noop_max)[source]¶

Parameters:	env (`gym.Env`) – An environment that will be wrapped. noop_max (`int`) – The maximum number of ‘NOOP’ actions. The number is selected randomly between 1 and noop_max.

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns: observation (object): the initial observation of the: space.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:	action (object) – an action provided by the environment
Returns:	observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class actorcritic.envs.atari.wrappers.AtariPreprocessFrameWrapper(env)[source]¶

Bases: gym.core.ObservationWrapper

A wrapper that scales the observations from 210x160 down to 84x84 and converts from RGB to grayscale by extracting the luminance.

__init__(env)[source]¶

Parameters:	env (`gym.Env`) – An environment that will be wrapped.

class actorcritic.envs.atari.wrappers.EpisodeInfoWrapper(env)[source]¶

Bases: gym.core.Wrapper

A wrapper that stores episode information in the info returned by gym.Env.step() at the end of an episode. More specifically, if an episode is terminal, info will contain the key ‘episode’ which has a dict value containing the ‘total_reward’, which is the cumulative reward of the episode.

Note

If you want to get the cumulative reward of the entire episode, AtariEpisodicLifeWrapper should be used after this wrapper.

__init__(env)[source]¶

Parameters:	env (`gym.Env`) – An environment that will be wrapped.

static get_episode_rewards_from_info_batch(infos)[source]¶

Utility function that extracts the episode rewards, that are inserted by the EpisodeInfoWrapper, out of the infos.

Parameters:	infos (`list` of `list`) – A batch-major list of infos as returned by `interact()`.
Returns:	`numpy.ndarray` – A batch-major array with the same shape as infos. It contains the episode reward of an info at the corresponding position. If no episode reward was in an info, the result will contain `numpy.nan` respectively.

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns: observation (object): the initial observation of the: space.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:	action (object) – an action provided by the environment
Returns:	observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class actorcritic.envs.atari.wrappers.FrameStackWrapper(env, num_stacked_frames)[source]¶

Bases: gym.core.Wrapper

A wrapper that stacks the last observations. The observations returned by this wrapper consist of the last frames.

__init__(env, num_stacked_frames)[source]¶

Parameters:	env (`gym.Env`) – An environment that will be wrapped. num_stacked_frames (`int`) – The number of frames that will be stacked.

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns: observation (object): the initial observation of the: space.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:	action (object) – an action provided by the environment
Returns:	observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class actorcritic.envs.atari.wrappers.RenderWrapper(env, fps=None)[source]¶

Bases: gym.core.Wrapper

A wrapper that calls gym.Env.render() every step.

__init__(env, fps=None)[source]¶

Parameters:	env (`gym.Env`) – An environment that will be wrapped. fps (`int`, `float`, optional) – If it is not None, the steps will be slowed down to run at the specified frames per second by waiting 1.0/fps seconds every step.

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns: observation (object): the initial observation of the: space.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:	action (object) – an action provided by the environment
Returns:	observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)