actorcritic.examples.atari.a2c_acktr¶
An example of how to use A2C and ACKTR to learn to play an Atari game.
Functions
create_environments (env_id, num_envs) |
Creates multiple Atari environments that run in subprocesses. |
create_optimizer (acktr, model, learning_rate) |
Creates an optimizer based on whether ACKTR or A2C is used. |
load_model (saver, checkpoint_path, session) |
Loads the latest model checkpoint (with the neural network parameters) from a directory. |
make_atari_env (env_id, render) |
Creates a gym.Env and wraps it with all Atari wrappers in actorcritic.envs.atari.wrappers . |
save_model (saver, checkpoint_path, …) |
Saves a model checkpoint to a directory. |
train_a2c_acktr (acktr, env_id, num_envs, …) |
Trains an Atari model using A2C or ACKTR. |
-
actorcritic.examples.atari.a2c_acktr.
create_environments
(env_id, num_envs)[source]¶ Creates multiple Atari environments that run in subprocesses.
Parameters: Returns: list
ofgym.Wrapper
– The environments.
-
actorcritic.examples.atari.a2c_acktr.
create_optimizer
(acktr, model, learning_rate)[source]¶ Creates an optimizer based on whether ACKTR or A2C is used. A2C uses the RMSProp optimizer, ACKTR uses the K-FAC optimizer. This function is not restricted to Atari models and can be used generally.
Parameters: - acktr (
bool
) – Whether to use the optimizer of ACKTR or A2C. - model (
ActorCriticModel
) – A model that is needed for K-FAC to register the neural network layers and the predictive distributions. - learning_rate (
float
ortf.Tensor
) – A learning rate for the optimizer.
- acktr (
-
actorcritic.examples.atari.a2c_acktr.
load_model
(saver, checkpoint_path, session)[source]¶ Loads the latest model checkpoint (with the neural network parameters) from a directory.
Parameters: - saver (
tf.train.Saver
) – A saver object to restore the model. - checkpoint_path (
string
) – A directory where the checkpoint is loaded from. - session (
tf.Session
) – A session which will contain the loaded variable values.
- saver (
-
actorcritic.examples.atari.a2c_acktr.
make_atari_env
(env_id, render)[source]¶ Creates a
gym.Env
and wraps it with all Atari wrappers inactorcritic.envs.atari.wrappers
.Parameters: Returns: gym.Env
– The environment.
-
actorcritic.examples.atari.a2c_acktr.
save_model
(saver, checkpoint_path, model_name, step, session)[source]¶ Saves a model checkpoint to a directory.
Parameters: - saver (
tf.train.Saver
) – A saver object to save the model. - checkpoint_path (
string
) – A directory where the model checkpoint will be saved. - model_name (
string
) – A name of the model. The checkpoint file in the checkpoint_path directory will have this name. - step (
int
ortf.Tensor
) – A number that is appended to the checkpoint file name. - session (
tf.Session
) – A session whose variables will be saved.
- saver (
-
actorcritic.examples.atari.a2c_acktr.
train_a2c_acktr
(acktr, env_id, num_envs, num_steps, checkpoint_path, model_name, summary_path=None)[source]¶ Trains an Atari model using A2C or ACKTR. Automatically saves and loads the trained model.
Parameters: - acktr (
bool
) – Whether the ACKTR or the A2C algorithm should be used. A2C uses the RMSProp optimizer and 64 filters in the third convolutional layer of the neural network. ACKTR uses the K-FAC optimizer and 32 filters. - env_id (
string
) – An id passed togym.make()
to create the environments. - num_envs (
int
) – The number of environments that will be used (so num_envs subprocesses will be created). A2C normally uses 16. ACKTR normally uses 32. - num_steps (
int
) – The number of steps to take in each iteration. A2C normally uses 5. ACKTR normally uses 20. - checkpoint_path (
string
) – A directory where the model’s checkpoints will be loaded and saved. - model_name (
string
) – A name of the model. The files in the checkpoint_path directory will have this name. - summary_path (
string
, optional) – A directory where the TensorBoard summaries will be saved. If not specified, no summaries will be saved.
- acktr (