actorcritic.envs.atari.wrappers¶
Contains wrappers that can wrap around environments to modify their functionality.
The implementations of these wrappers are adopted from OpenAI.
Classes
AtariClipRewardWrapper (env) |
A wrapper that clips the rewards between -1 and 1. |
AtariEpisodicLifeWrapper (env) |
A wrapper that ends episodes (returns terminal = True) after a life in the Atari game has been lost. |
AtariFireResetWrapper (env) |
A wrapper that executes the ‘FIRE’ action after the environment has been reset. |
AtariFrameskipWrapper (env, frameskip) |
A wrapper that skips frames. |
AtariInfoClearWrapper (env) |
A wrapper that removes unnecessary data in the info returned by gym.Env.step() . |
AtariNoopResetWrapper (env, noop_max) |
A wrapper that executes a random number of ‘NOOP’ actions. |
AtariPreprocessFrameWrapper (env) |
A wrapper that scales the observations from 210x160 down to 84x84 and converts from RGB to grayscale by extracting the luminance. |
EpisodeInfoWrapper (env) |
A wrapper that stores episode information in the info returned by gym.Env.step() at the end of an episode. |
FrameStackWrapper (env, num_stacked_frames) |
A wrapper that stacks the last observations. |
RenderWrapper (env[, fps]) |
A wrapper that calls gym.Env.render() every step. |
-
class
actorcritic.envs.atari.wrappers.
AtariClipRewardWrapper
(env)[source]¶ Bases:
gym.core.RewardWrapper
A wrapper that clips the rewards between -1 and 1.
-
class
actorcritic.envs.atari.wrappers.
AtariEpisodicLifeWrapper
(env)[source]¶ Bases:
gym.core.Wrapper
A wrapper that ends episodes (returns terminal = True) after a life in the Atari game has been lost.
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.
AtariFireResetWrapper
(env)[source]¶ Bases:
gym.core.Wrapper
A wrapper that executes the ‘FIRE’ action after the environment has been reset.
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.
AtariFrameskipWrapper
(env, frameskip)[source]¶ Bases:
gym.core.Wrapper
A wrapper that skips frames.
-
__init__
(env, frameskip)[source]¶ Parameters: - env (
gym.Env
) – An environment that will be wrapped. - frameskip (
int
) – Every frameskip-th frame is used. The remaining frames are skipped.
- env (
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.
AtariInfoClearWrapper
(env)[source]¶ Bases:
gym.core.Wrapper
A wrapper that removes unnecessary data in the info returned by
gym.Env.step()
. This reduces the amount of inter-process data.Warning
AtariEpisodicLifeWrapper
does not work afterwards, so it should be used before.-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.
AtariNoopResetWrapper
(env, noop_max)[source]¶ Bases:
gym.core.Wrapper
A wrapper that executes a random number of ‘NOOP’ actions.
-
__init__
(env, noop_max)[source]¶ Parameters: - env (
gym.Env
) – An environment that will be wrapped. - noop_max (
int
) – The maximum number of ‘NOOP’ actions. The number is selected randomly between 1 and noop_max.
- env (
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.
AtariPreprocessFrameWrapper
(env)[source]¶ Bases:
gym.core.ObservationWrapper
A wrapper that scales the observations from 210x160 down to 84x84 and converts from RGB to grayscale by extracting the luminance.
-
class
actorcritic.envs.atari.wrappers.
EpisodeInfoWrapper
(env)[source]¶ Bases:
gym.core.Wrapper
A wrapper that stores episode information in the info returned by
gym.Env.step()
at the end of an episode. More specifically, if an episode is terminal, info will contain the key ‘episode’ which has adict
value containing the ‘total_reward’, which is the cumulative reward of the episode.Note
If you want to get the cumulative reward of the entire episode,
AtariEpisodicLifeWrapper
should be used after this wrapper.-
static
get_episode_rewards_from_info_batch
(infos)[source]¶ Utility function that extracts the episode rewards, that are inserted by the
EpisodeInfoWrapper
, out of the infos.Parameters: infos ( list
oflist
) – A batch-major list of infos as returned byinteract()
.Returns: numpy.ndarray
– A batch-major array with the same shape as infos. It contains the episode reward of an info at the corresponding position. If no episode reward was in an info, the result will containnumpy.nan
respectively.
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
static
-
class
actorcritic.envs.atari.wrappers.
FrameStackWrapper
(env, num_stacked_frames)[source]¶ Bases:
gym.core.Wrapper
A wrapper that stacks the last observations. The observations returned by this wrapper consist of the last frames.
-
__init__
(env, num_stacked_frames)[source]¶ Parameters: - env (
gym.Env
) – An environment that will be wrapped. - num_stacked_frames (
int
) – The number of frames that will be stacked.
- env (
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
actorcritic.envs.atari.wrappers.
RenderWrapper
(env, fps=None)[source]¶ Bases:
gym.core.Wrapper
A wrapper that calls
gym.Env.render()
every step.-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-