actorcritic.examples.atari.a2c_acktr

An example of how to use A2C and ACKTR to learn to play an Atari game.

Functions

create_environments(env_id, num_envs) Creates multiple Atari environments that run in subprocesses.
create_optimizer(acktr, model, learning_rate) Creates an optimizer based on whether ACKTR or A2C is used.
load_model(saver, checkpoint_path, session) Loads the latest model checkpoint (with the neural network parameters) from a directory.
make_atari_env(env_id, render) Creates a gym.Env and wraps it with all Atari wrappers in actorcritic.envs.atari.wrappers.
save_model(saver, checkpoint_path, …) Saves a model checkpoint to a directory.
train_a2c_acktr(acktr, env_id, num_envs, …) Trains an Atari model using A2C or ACKTR.
actorcritic.examples.atari.a2c_acktr.create_environments(env_id, num_envs)[source]

Creates multiple Atari environments that run in subprocesses.

Parameters:
  • env_id (string) – An id passed to gym.make() to create the environments.
  • num_envs (int) – The number of environments (and subprocesses) that will be created.
Returns:

list of gym.Wrapper – The environments.

actorcritic.examples.atari.a2c_acktr.create_optimizer(acktr, model, learning_rate)[source]

Creates an optimizer based on whether ACKTR or A2C is used. A2C uses the RMSProp optimizer, ACKTR uses the K-FAC optimizer. This function is not restricted to Atari models and can be used generally.

Parameters:
  • acktr (bool) – Whether to use the optimizer of ACKTR or A2C.
  • model (ActorCriticModel) – A model that is needed for K-FAC to register the neural network layers and the predictive distributions.
  • learning_rate (float or tf.Tensor) – A learning rate for the optimizer.
actorcritic.examples.atari.a2c_acktr.load_model(saver, checkpoint_path, session)[source]

Loads the latest model checkpoint (with the neural network parameters) from a directory.

Parameters:
  • saver (tf.train.Saver) – A saver object to restore the model.
  • checkpoint_path (string) – A directory where the checkpoint is loaded from.
  • session (tf.Session) – A session which will contain the loaded variable values.
actorcritic.examples.atari.a2c_acktr.make_atari_env(env_id, render)[source]

Creates a gym.Env and wraps it with all Atari wrappers in actorcritic.envs.atari.wrappers.

Parameters:
  • env_id (string) – An id passed to gym.make().
  • render (bool) – Whether this environment should be rendered.
Returns:

gym.Env – The environment.

actorcritic.examples.atari.a2c_acktr.save_model(saver, checkpoint_path, model_name, step, session)[source]

Saves a model checkpoint to a directory.

Parameters:
  • saver (tf.train.Saver) – A saver object to save the model.
  • checkpoint_path (string) – A directory where the model checkpoint will be saved.
  • model_name (string) – A name of the model. The checkpoint file in the checkpoint_path directory will have this name.
  • step (int or tf.Tensor) – A number that is appended to the checkpoint file name.
  • session (tf.Session) – A session whose variables will be saved.
actorcritic.examples.atari.a2c_acktr.train_a2c_acktr(acktr, env_id, num_envs, num_steps, checkpoint_path, model_name, summary_path=None)[source]

Trains an Atari model using A2C or ACKTR. Automatically saves and loads the trained model.

Parameters:
  • acktr (bool) – Whether the ACKTR or the A2C algorithm should be used. A2C uses the RMSProp optimizer and 64 filters in the third convolutional layer of the neural network. ACKTR uses the K-FAC optimizer and 32 filters.
  • env_id (string) – An id passed to gym.make() to create the environments.
  • num_envs (int) – The number of environments that will be used (so num_envs subprocesses will be created). A2C normally uses 16. ACKTR normally uses 32.
  • num_steps (int) – The number of steps to take in each iteration. A2C normally uses 5. ACKTR normally uses 20.
  • checkpoint_path (string) – A directory where the model’s checkpoints will be loaded and saved.
  • model_name (string) – A name of the model. The files in the checkpoint_path directory will have this name.
  • summary_path (string, optional) – A directory where the TensorBoard summaries will be saved. If not specified, no summaries will be saved.