actorcritic.envs.atari.model¶
An implementation of an actor-critic model that is aimed at Atari games.
Classes
AtariModel (observation_space, action_space) |
An ActorCriticModel that follows the A3C and ACKTR paper. |
-
class
actorcritic.envs.atari.model.
AtariModel
(observation_space, action_space, conv3_num_filters=64, random_seed=None, name=None)[source]¶ Bases:
actorcritic.model.ActorCriticModel
An
ActorCriticModel
that follows the A3C and ACKTR paper.The observations are sent to three convolutional layers followed by a fully connected layer, each using rectifier activation functions (ReLU). The policy and the baseline use fully connected layers built on top of the last hidden fully connected layer separately. The policy layer has one unit for each action and its outputs are used as logits for a categorical distribution (softmax). The baseline layer has only one unit which represents its value.
The weights of the layers are orthogonally initialized.
Detailed network architecture:
- Conv2D: 32 filters 8x8, stride 4
- ReLU
- Conv2D: 64 filters 4x4, stride 2
- ReLU
- Conv2D: 64 filters 3x3, stride 1 (number of filters based on argument conv3_num_filters)
- Flatten
- Fully connected: 512 units
- ReLU
- Fully connected (policy): units = number of actions / Fully connected (baseline): 1 unit
A2C uses 64 filters in the third convolutional layer. ACKTR uses 32.
The policy is a
SoftmaxPolicy
. The baseline is aStateValueFunction
.See also
This network architecture was originally used in: https://www.nature.com/articles/nature14236
-
__init__
(observation_space, action_space, conv3_num_filters=64, random_seed=None, name=None)[source]¶ Parameters: - observation_space (
gym.spaces.Space
) – A space that determines the shape of theobservations_placeholder
and thebootstrap_observations_placeholder
. - action_space (
gym.spaces.Space
) – A space that determines the shape of theactions_placeholder
. - conv3_num_filters (
int
, optional) – Number of filters used for the third convolutional layer, defaults to 64. ACKTR uses 32. - random_seed (
int
, optional) – A random seed used for sampling from the ~actorcritic.policies.SoftmaxPolicy. - name (
string
, optional) – A name for this model.
- observation_space (