actorcritic.examples.atari.a2c_acktr¶

An example of how to use A2C and ACKTR to learn to play an Atari game.

Functions

`create_environments`(env_id, num_envs)	Creates multiple Atari environments that run in subprocesses.
`create_optimizer`(acktr, model, learning_rate)	Creates an optimizer based on whether ACKTR or A2C is used.
`load_model`(saver, checkpoint_path, session)	Loads the latest model checkpoint (with the neural network parameters) from a directory.
`make_atari_env`(env_id, render)	Creates a `gym.Env` and wraps it with all Atari wrappers in `actorcritic.envs.atari.wrappers`.
`save_model`(saver, checkpoint_path, …)	Saves a model checkpoint to a directory.
`train_a2c_acktr`(acktr, env_id, num_envs, …)	Trains an Atari model using A2C or ACKTR.

actorcritic.examples.atari.a2c_acktr.create_environments(env_id, num_envs)[source]¶

Creates multiple Atari environments that run in subprocesses.

Parameters:	env_id (`string`) – An id passed to `gym.make()` to create the environments. num_envs (`int`) – The number of environments (and subprocesses) that will be created.
Returns:	`list` of `gym.Wrapper` – The environments.

actorcritic.examples.atari.a2c_acktr.create_optimizer(acktr, model, learning_rate)[source]¶

Creates an optimizer based on whether ACKTR or A2C is used. A2C uses the RMSProp optimizer, ACKTR uses the K-FAC optimizer. This function is not restricted to Atari models and can be used generally.

Parameters:	acktr (`bool`) – Whether to use the optimizer of ACKTR or A2C. model (`ActorCriticModel`) – A model that is needed for K-FAC to register the neural network layers and the predictive distributions. learning_rate (`float` or `tf.Tensor`) – A learning rate for the optimizer.

actorcritic.examples.atari.a2c_acktr.load_model(saver, checkpoint_path, session)[source]¶

Loads the latest model checkpoint (with the neural network parameters) from a directory.

Parameters:	saver (`tf.train.Saver`) – A saver object to restore the model. checkpoint_path (`string`) – A directory where the checkpoint is loaded from. session (`tf.Session`) – A session which will contain the loaded variable values.

actorcritic.examples.atari.a2c_acktr.make_atari_env(env_id, render)[source]¶

Creates a gym.Env and wraps it with all Atari wrappers in actorcritic.envs.atari.wrappers.

Parameters:	env_id (`string`) – An id passed to `gym.make()`. render (`bool`) – Whether this environment should be rendered.
Returns:	`gym.Env` – The environment.

actorcritic.examples.atari.a2c_acktr.save_model(saver, checkpoint_path, model_name, step, session)[source]¶

Saves a model checkpoint to a directory.

Parameters:

saver (tf.train.Saver) – A saver object to save the model.
checkpoint_path (string) – A directory where the model checkpoint will be saved.
model_name (string) – A name of the model. The checkpoint file in the checkpoint_path directory will have this name.
step (int or tf.Tensor) – A number that is appended to the checkpoint file name.
session (tf.Session) – A session whose variables will be saved.

actorcritic.examples.atari.a2c_acktr.train_a2c_acktr(acktr, env_id, num_envs, num_steps, checkpoint_path, model_name, summary_path=None)[source]¶

Trains an Atari model using A2C or ACKTR. Automatically saves and loads the trained model.

Parameters:

acktr (bool) – Whether the ACKTR or the A2C algorithm should be used. A2C uses the RMSProp optimizer and 64 filters in the third convolutional layer of the neural network. ACKTR uses the K-FAC optimizer and 32 filters.
env_id (string) – An id passed to gym.make() to create the environments.
num_envs (int) – The number of environments that will be used (so num_envs subprocesses will be created). A2C normally uses 16. ACKTR normally uses 32.
num_steps (int) – The number of steps to take in each iteration. A2C normally uses 5. ACKTR normally uses 20.
checkpoint_path (string) – A directory where the model’s checkpoints will be loaded and saved.
model_name (string) – A name of the model. The files in the checkpoint_path directory will have this name.
summary_path (string, optional) – A directory where the TensorBoard summaries will be saved. If not specified, no summaries will be saved.