actorcritic.multi_env¶
Contains classes that provide the ability to run multiple environments in subprocesses.
Functions
create_subprocess_envs (env_fns) |
Utility function that creates environments by calling the functions in env_fns and wrapping the returned environments in SubprocessEnv s. |
Classes
MultiEnv (envs) |
An environment that maintains multiple SubprocessEnv s and executes them in parallel. |
SubprocessEnv (env_fn) |
Maintains a gym.Env inside a subprocess, so it can run concurrently. |
-
class
actorcritic.multi_env.
MultiEnv
(envs)[source]¶ Bases:
object
An environment that maintains multiple
SubprocessEnv
s and executes them in parallel.The environments will be reset automatically when a terminal state is reached. That means that
reset()
actually only has to be called once in the beginning.-
__init__
(envs)[source]¶ Parameters: envs ( list
ofSubprocessEnv
) – The environments. The observation and action spaces must be equal across the environments.
-
action_space
¶ gym.spaces.Space
– The action space used by all environments.
-
observation_space
¶ gym.spaces.Space
– The observation space used by all environments.
-
-
class
actorcritic.multi_env.
SubprocessEnv
(env_fn)[source]¶ Bases:
gym.core.Env
Maintains a
gym.Env
inside a subprocess, so it can run concurrently. If the subprocess ends unexpectedly, it will be recreated automatically without interrupting the execution.To use the subprocess
start()
has to be called first. After thatinitialize()
has to be called to retrieve the observation space and the action space from the underlying environment. The purpose of these methods is that multipleSubprocessEnv
s can be created and started in parallel without blocking the execution, which creates the underlyinggym.Env
already. Afterwardsstart()
, which blocks the execution, can be called in parallel. Seecreate_subprocess_envs()
which implements this idea.-
__init__
(env_fn)[source]¶ Parameters: env_fn ( callable
) – A function that returns agym.Env
. It will be called inside the subprocess, so watch out for referencing variables on the main process or the like. It possibly will be called multiple times, since the subprocess will be recreated when it unexpectedly ends.
-
action_space
¶ gym.spaces.Space
– The action space of the underlying environment. Does not block the execution.start()
andinitialize()
must have been called.
-
initialize
()[source]¶ Retrieves the observation space and the action space from the environment in the subprocess. This method blocks until the execution is finished.
start()
must have been called.
-
observation_space
¶ gym.spaces.Space
– The observation space of the underlying environment. Does not block the execution.start()
andinitialize()
must have been called.
-
render
(mode='human')[source]¶ Remotely calls
gym.Env.render()
in the subprocess. This methods blocks until execution is finished.start()
andinitialize()
must have been called.Parameters: mode ( str
) – The mode argument passed togym.Env.render()
.Returns: The value returned by gym.Env.render()
.
-
reset
(**kwargs)[source]¶ Remotely calls
gym.Env.reset()
in the underlying environment. This method blocks until execution is finished.start()
andinitialize()
must have been called.Parameters: kwargs ( dict
) – Keyword arguments passed togym.Env.reset()
.Returns: The value returned by gym.Env.reset()
.
-
start
()[source]¶ Starts the subprocess. Does not block. You should call
initialize()
afterwards.
-
step
(action)[source]¶ Remotely calls
gym.Env.step()
in the underlying environment. This method blocks until execution is finished.start()
andinitialize()
must have been called.Parameters: action – The action argument passed to gym.Env.step()
.Returns: tuple
– A tuple of (observation, reward, terminal, info). The values returned bygym.Env.step()
.
-
-
class
actorcritic.multi_env.
_AutoResetWrapper
(env)[source]¶ Bases:
gym.core.Wrapper
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns: observation (object): the initial observation of the
- space.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the environment Returns: observation (object) – agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (boolean): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
actorcritic.multi_env.
create_subprocess_envs
(env_fns)[source]¶ Utility function that creates environments by calling the functions in env_fns and wrapping the returned environments in
SubprocessEnv
s. They will be started and initialized in parallel.Parameters: env_fns ( list
ofcallable
) – A list of functions that return agym.Env
. They should not be instances ofSubprocessEnv
.Returns: list
ofSubprocessEnv
– A list of the created environments.