actorcritic.multi_env

Contains classes that provide the ability to run multiple environments in subprocesses.

Functions

create_subprocess_envs(env_fns) Utility function that creates environments by calling the functions in env_fns and wrapping the returned environments in SubprocessEnvs.

Classes

MultiEnv(envs) An environment that maintains multiple SubprocessEnvs and executes them in parallel.
SubprocessEnv(env_fn) Maintains a gym.Env inside a subprocess, so it can run concurrently.
class actorcritic.multi_env.MultiEnv(envs)[source]

Bases: object

An environment that maintains multiple SubprocessEnvs and executes them in parallel.

The environments will be reset automatically when a terminal state is reached. That means that reset() actually only has to be called once in the beginning.

__init__(envs)[source]
Parameters:envs (list of SubprocessEnv) – The environments. The observation and action spaces must be equal across the environments.
action_space

gym.spaces.Space – The action space used by all environments.

close()[source]

Closes all environments.

envs

list of gym.Env – The environments.

observation_space

gym.spaces.Space – The observation space used by all environments.

reset()[source]

Resets all environments.

Returns:list – A list of observations received from each environment.
step(actions)[source]

Proceeds one step in all environments.

Parameters:actions (list) – A list of actions to be executed in the environments.
Returns:tuple – A tuple of (observations, rewards, terminals, infos). Each element is a list containing the values received from the environments.
class actorcritic.multi_env.SubprocessEnv(env_fn)[source]

Bases: gym.core.Env

Maintains a gym.Env inside a subprocess, so it can run concurrently. If the subprocess ends unexpectedly, it will be recreated automatically without interrupting the execution.

To use the subprocess start() has to be called first. After that initialize() has to be called to retrieve the observation space and the action space from the underlying environment. The purpose of these methods is that multiple SubprocessEnvs can be created and started in parallel without blocking the execution, which creates the underlying gym.Env already. Afterwards start(), which blocks the execution, can be called in parallel. See create_subprocess_envs() which implements this idea.

__init__(env_fn)[source]
Parameters:env_fn (callable) – A function that returns a gym.Env. It will be called inside the subprocess, so watch out for referencing variables on the main process or the like. It possibly will be called multiple times, since the subprocess will be recreated when it unexpectedly ends.
action_space

gym.spaces.Space – The action space of the underlying environment. Does not block the execution. start() and initialize() must have been called.

close()[source]

Closes the subprocess.

initialize()[source]

Retrieves the observation space and the action space from the environment in the subprocess. This method blocks until the execution is finished. start() must have been called.

observation_space

gym.spaces.Space – The observation space of the underlying environment. Does not block the execution. start() and initialize() must have been called.

render(mode='human')[source]

Remotely calls gym.Env.render() in the subprocess. This methods blocks until execution is finished. start() and initialize() must have been called.

Parameters:mode (str) – The mode argument passed to gym.Env.render().
Returns:The value returned by gym.Env.render().
reset(**kwargs)[source]

Remotely calls gym.Env.reset() in the underlying environment. This method blocks until execution is finished. start() and initialize() must have been called.

Parameters:kwargs (dict) – Keyword arguments passed to gym.Env.reset().
Returns:The value returned by gym.Env.reset().
start()[source]

Starts the subprocess. Does not block. You should call initialize() afterwards.

step(action)[source]

Remotely calls gym.Env.step() in the underlying environment. This method blocks until execution is finished. start() and initialize() must have been called.

Parameters:action – The action argument passed to gym.Env.step().
Returns:tuple – A tuple of (observation, reward, terminal, info). The values returned by gym.Env.step().
class actorcritic.multi_env._AutoResetWrapper(env)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns: observation (object): the initial observation of the
space.
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:action (object) – an action provided by the environment
Returns:observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
actorcritic.multi_env.create_subprocess_envs(env_fns)[source]

Utility function that creates environments by calling the functions in env_fns and wrapping the returned environments in SubprocessEnvs. They will be started and initialized in parallel.

Parameters:env_fns (list of callable) – A list of functions that return a gym.Env. They should not be instances of SubprocessEnv.
Returns:list of SubprocessEnv – A list of the created environments.