actorcritic.model¶

Contains the base class of actor-critic models.

Classes

ActorCriticModel(observation_space, action_space) Represents a model (e.g.

class actorcritic.model.ActorCriticModel(observation_space, action_space)[source]¶

Bases: object

Represents a model (e.g. a neural net) that provides the functionalities required for actor-critic algorithms. Provides a policy, a baseline (that is subtracted from the target values to compute the advantage) and the values used for bootstrapping from next observations (ideally the values of the baseline), and the placeholders.

__init__(observation_space, action_space)[source]¶

Parameters:	observation_space (`gym.spaces.Space`) – A space that determines the shape of the `observations_placeholder` and the `bootstrap_observations_placeholder`. action_space (`gym.spaces.Space`) – A space that determines the shape of the `actions_placeholder`.

actions_placeholder¶: tf.Tensor – The placeholder for the sampled actions.

baseline¶: Baseline – The baseline used by this model.

bootstrap_observations_placeholder¶: tf.Tensor – The placeholder for the sampled next observations. These are used to compute the bootstrap_values.

bootstrap_values¶: tf.Tensor – The bootstrapped values that are computed based on the observations passed to the bootstrap_observations_placeholder.

observations_placeholder¶: tf.Tensor – The placeholder for the sampled observations.

policy¶: Policy – The policy used by this model.

register_layers(layer_collection)[source]¶

Registers the layers of this model (neural net) in the specified kfac.LayerCollection (required for K-FAC).

Parameters:	layer_collection (`kfac.LayerCollection`) – A layer collection used by the `KfacOptimizer`.
Raises:	`NotImplementedError` – If this model does not support K-FAC.

register_predictive_distributions(layer_collection, random_seed=None)[source]¶

Registers the predictive distributions of the policy and the baseline in the specified kfac.LayerCollection (required for K-FAC).

Parameters:	layer_collection (`kfac.LayerCollection`) – A layer collection used by the `KfacOptimizer`. random_seed (`int`, optional) – A random seed used for sampling from the predictive distributions.

rewards_placeholder¶: tf.Tensor – The placeholder for the sampled rewards (scalars).

sample_actions(observations, session)[source]¶

Samples actions from the policy based on the specified observations.

Parameters:	observations – The observations that will be passed to the `observations_placeholder`. session (`tf.Session`) – A session that will be used to compute the values.
Returns:	`list` of `list` – A list of lists of actions. The shape equals the shape of observations.

select_max_actions(observations, session)[source]¶

Selects actions from the policy that have the highest probability (mode) based on the specified observations.

Parameters:	observations – The observations that will be passed to the `observations_placeholder`. session (`tf.Session`) – A session that will be used to compute the values.
Returns:	`list` of `list` – A list of lists of actions. The shape equals the shape of observations.

terminals_placeholder¶: tf.Tensor – The placeholder for the sampled terminals (booleans).