actorcritic.envs.atari.wrappers¶
Contains wrappers that can wrap around environments to modify their functionality.
The implementations of these wrappers are adopted from OpenAI.
Classes
AtariClipRewardWrapper(env) |
A wrapper that clips the rewards between -1 and 1. |
AtariEpisodicLifeWrapper(env) |
A wrapper that ends episodes (returns terminal = True) after a life in the Atari game has been lost. |
AtariFireResetWrapper(env) |
A wrapper that executes the ‘FIRE’ action after the environment has been reset. |
AtariFrameskipWrapper(env, frameskip) |
A wrapper that skips frames. |
AtariInfoClearWrapper(env) |
A wrapper that removes unnecessary data in the info returned by gym.Env.step(). |
AtariNoopResetWrapper(env, noop_max) |
A wrapper that executes a random number of ‘NOOP’ actions. |
AtariPreprocessFrameWrapper(env) |
A wrapper that scales the observations from 210x160 down to 84x84 and converts from RGB to grayscale by extracting the luminance. |
EpisodeInfoWrapper(env) |
A wrapper that stores episode information in the info returned by gym.Env.step() at the end of an episode. |
FrameStackWrapper(env, num_stacked_frames) |
A wrapper that stacks the last observations. |
RenderWrapper(env[, fps]) |
A wrapper that calls gym.Env.render() every step. |
-
class
actorcritic.envs.atari.wrappers.AtariClipRewardWrapper(env)[source]¶ Bases:
gym.core.RewardWrapperA wrapper that clips the rewards between -1 and 1.
-
class
actorcritic.envs.atari.wrappers.AtariEpisodicLifeWrapper(env)[source]¶ Bases:
gym.core.WrapperA wrapper that ends episodes (returns terminal = True) after a life in the Atari game has been lost.
-
reset(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.AtariFireResetWrapper(env)[source]¶ Bases:
gym.core.WrapperA wrapper that executes the ‘FIRE’ action after the environment has been reset.
-
reset(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.AtariFrameskipWrapper(env, frameskip)[source]¶ Bases:
gym.core.WrapperA wrapper that skips frames.
-
__init__(env, frameskip)[source]¶ Parameters: - env (
gym.Env) – An environment that will be wrapped. - frameskip (
int) – Every frameskip-th frame is used. The remaining frames are skipped.
- env (
-
reset(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.AtariInfoClearWrapper(env)[source]¶ Bases:
gym.core.WrapperA wrapper that removes unnecessary data in the info returned by
gym.Env.step(). This reduces the amount of inter-process data.Warning
AtariEpisodicLifeWrapperdoes not work afterwards, so it should be used before.-
reset(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.AtariNoopResetWrapper(env, noop_max)[source]¶ Bases:
gym.core.WrapperA wrapper that executes a random number of ‘NOOP’ actions.
-
__init__(env, noop_max)[source]¶ Parameters: - env (
gym.Env) – An environment that will be wrapped. - noop_max (
int) – The maximum number of ‘NOOP’ actions. The number is selected randomly between 1 and noop_max.
- env (
-
reset(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.AtariPreprocessFrameWrapper(env)[source]¶ Bases:
gym.core.ObservationWrapperA wrapper that scales the observations from 210x160 down to 84x84 and converts from RGB to grayscale by extracting the luminance.
-
class
actorcritic.envs.atari.wrappers.EpisodeInfoWrapper(env)[source]¶ Bases:
gym.core.WrapperA wrapper that stores episode information in the info returned by
gym.Env.step()at the end of an episode. More specifically, if an episode is terminal, info will contain the key ‘episode’ which has adictvalue containing the ‘total_reward’, which is the cumulative reward of the episode.Note
If you want to get the cumulative reward of the entire episode,
AtariEpisodicLifeWrappershould be used after this wrapper.-
static
get_episode_rewards_from_info_batch(infos)[source]¶ Utility function that extracts the episode rewards, that are inserted by the
EpisodeInfoWrapper, out of the infos.Parameters: infos ( listoflist) – A batch-major list of infos as returned byinteract().Returns: numpy.ndarray– A batch-major array with the same shape as infos. It contains the episode reward of an info at the corresponding position. If no episode reward was in an info, the result will containnumpy.nanrespectively.
-
reset(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
static
-
class
actorcritic.envs.atari.wrappers.FrameStackWrapper(env, num_stacked_frames)[source]¶ Bases:
gym.core.WrapperA wrapper that stacks the last observations. The observations returned by this wrapper consist of the last frames.
-
__init__(env, num_stacked_frames)[source]¶ Parameters: - env (
gym.Env) – An environment that will be wrapped. - num_stacked_frames (
int) – The number of frames that will be stacked.
- env (
-
reset(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.RenderWrapper(env, fps=None)[source]¶ Bases:
gym.core.WrapperA wrapper that calls
gym.Env.render()every step.-
reset(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-