actorcritic.envs.atari.model

An implementation of an actor-critic model that is aimed at Atari games.

Classes

AtariModel(observation_space, action_space) An ActorCriticModel that follows the A3C and ACKTR paper.
class actorcritic.envs.atari.model.AtariModel(observation_space, action_space, conv3_num_filters=64, random_seed=None, name=None)[source]

Bases: actorcritic.model.ActorCriticModel

An ActorCriticModel that follows the A3C and ACKTR paper.

The observations are sent to three convolutional layers followed by a fully connected layer, each using rectifier activation functions (ReLU). The policy and the baseline use fully connected layers built on top of the last hidden fully connected layer separately. The policy layer has one unit for each action and its outputs are used as logits for a categorical distribution (softmax). The baseline layer has only one unit which represents its value.

The weights of the layers are orthogonally initialized.

Detailed network architecture:

  • Conv2D: 32 filters 8x8, stride 4
  • ReLU
  • Conv2D: 64 filters 4x4, stride 2
  • ReLU
  • Conv2D: 64 filters 3x3, stride 1 (number of filters based on argument conv3_num_filters)
  • Flatten
  • Fully connected: 512 units
  • ReLU
  • Fully connected (policy): units = number of actions / Fully connected (baseline): 1 unit

A2C uses 64 filters in the third convolutional layer. ACKTR uses 32.

The policy is a SoftmaxPolicy. The baseline is a StateValueFunction.

See also

This network architecture was originally used in: https://www.nature.com/articles/nature14236

__init__(observation_space, action_space, conv3_num_filters=64, random_seed=None, name=None)[source]
Parameters:
  • observation_space (gym.spaces.Space) – A space that determines the shape of the observations_placeholder and the bootstrap_observations_placeholder.
  • action_space (gym.spaces.Space) – A space that determines the shape of the actions_placeholder.
  • conv3_num_filters (int, optional) – Number of filters used for the third convolutional layer, defaults to 64. ACKTR uses 32.
  • random_seed (int, optional) – A random seed used for sampling from the ~actorcritic.policies.SoftmaxPolicy.
  • name (string, optional) – A name for this model.
register_layers(layer_collection)[source]

Registers the layers of this model (neural net) in the specified kfac.LayerCollection (required for K-FAC).

Parameters:layer_collection (kfac.LayerCollection) – A layer collection used by the KfacOptimizer.