actorcritic.model¶
Contains the base class of actor-critic models.
Classes
ActorCriticModel(observation_space, action_space) |
Represents a model (e.g. |
-
class
actorcritic.model.ActorCriticModel(observation_space, action_space)[source]¶ Bases:
objectRepresents a model (e.g. a neural net) that provides the functionalities required for actor-critic algorithms. Provides a policy, a baseline (that is subtracted from the target values to compute the advantage) and the values used for bootstrapping from next observations (ideally the values of the baseline), and the placeholders.
-
__init__(observation_space, action_space)[source]¶ Parameters: - observation_space (
gym.spaces.Space) – A space that determines the shape of theobservations_placeholderand thebootstrap_observations_placeholder. - action_space (
gym.spaces.Space) – A space that determines the shape of theactions_placeholder.
- observation_space (
-
actions_placeholder¶ tf.Tensor– The placeholder for the sampled actions.
-
bootstrap_observations_placeholder¶ tf.Tensor– The placeholder for the sampled next observations. These are used to compute thebootstrap_values.
-
bootstrap_values¶ tf.Tensor– The bootstrapped values that are computed based on the observations passed to thebootstrap_observations_placeholder.
-
observations_placeholder¶ tf.Tensor– The placeholder for the sampled observations.
-
register_layers(layer_collection)[source]¶ Registers the layers of this model (neural net) in the specified
kfac.LayerCollection(required for K-FAC).Parameters: layer_collection ( kfac.LayerCollection) – A layer collection used by theKfacOptimizer.Raises: NotImplementedError– If this model does not support K-FAC.
-
register_predictive_distributions(layer_collection, random_seed=None)[source]¶ Registers the predictive distributions of the policy and the baseline in the specified
kfac.LayerCollection(required for K-FAC).Parameters: - layer_collection (
kfac.LayerCollection) – A layer collection used by theKfacOptimizer. - random_seed (
int, optional) – A random seed used for sampling from the predictive distributions.
- layer_collection (
-
rewards_placeholder¶ tf.Tensor– The placeholder for the sampled rewards (scalars).
-
sample_actions(observations, session)[source]¶ Samples actions from the policy based on the specified observations.
Parameters: - observations – The observations that will be passed to the
observations_placeholder. - session (
tf.Session) – A session that will be used to compute the values.
Returns: listoflist– A list of lists of actions. The shape equals the shape of observations.- observations – The observations that will be passed to the
-
select_max_actions(observations, session)[source]¶ Selects actions from the policy that have the highest probability (mode) based on the specified observations.
Parameters: - observations – The observations that will be passed to the
observations_placeholder. - session (
tf.Session) – A session that will be used to compute the values.
Returns: listoflist– A list of lists of actions. The shape equals the shape of observations.- observations – The observations that will be passed to the
-
terminals_placeholder¶ tf.Tensor– The placeholder for the sampled terminals (booleans).
-