actorcritic.model¶
Contains the base class of actor-critic models.
Classes
ActorCriticModel (observation_space, action_space) |
Represents a model (e.g. |
-
class
actorcritic.model.
ActorCriticModel
(observation_space, action_space)[source]¶ Bases:
object
Represents a model (e.g. a neural net) that provides the functionalities required for actor-critic algorithms. Provides a policy, a baseline (that is subtracted from the target values to compute the advantage) and the values used for bootstrapping from next observations (ideally the values of the baseline), and the placeholders.
-
__init__
(observation_space, action_space)[source]¶ Parameters: - observation_space (
gym.spaces.Space
) – A space that determines the shape of theobservations_placeholder
and thebootstrap_observations_placeholder
. - action_space (
gym.spaces.Space
) – A space that determines the shape of theactions_placeholder
.
- observation_space (
-
actions_placeholder
¶ tf.Tensor
– The placeholder for the sampled actions.
-
bootstrap_observations_placeholder
¶ tf.Tensor
– The placeholder for the sampled next observations. These are used to compute thebootstrap_values
.
-
bootstrap_values
¶ tf.Tensor
– The bootstrapped values that are computed based on the observations passed to thebootstrap_observations_placeholder
.
-
observations_placeholder
¶ tf.Tensor
– The placeholder for the sampled observations.
-
register_layers
(layer_collection)[source]¶ Registers the layers of this model (neural net) in the specified
kfac.LayerCollection
(required for K-FAC).Parameters: layer_collection ( kfac.LayerCollection
) – A layer collection used by theKfacOptimizer
.Raises: NotImplementedError
– If this model does not support K-FAC.
-
register_predictive_distributions
(layer_collection, random_seed=None)[source]¶ Registers the predictive distributions of the policy and the baseline in the specified
kfac.LayerCollection
(required for K-FAC).Parameters: - layer_collection (
kfac.LayerCollection
) – A layer collection used by theKfacOptimizer
. - random_seed (
int
, optional) – A random seed used for sampling from the predictive distributions.
- layer_collection (
-
rewards_placeholder
¶ tf.Tensor
– The placeholder for the sampled rewards (scalars).
-
sample_actions
(observations, session)[source]¶ Samples actions from the policy based on the specified observations.
Parameters: - observations – The observations that will be passed to the
observations_placeholder
. - session (
tf.Session
) – A session that will be used to compute the values.
Returns: list
oflist
– A list of lists of actions. The shape equals the shape of observations.- observations – The observations that will be passed to the
-
select_max_actions
(observations, session)[source]¶ Selects actions from the policy that have the highest probability (mode) based on the specified observations.
Parameters: - observations – The observations that will be passed to the
observations_placeholder
. - session (
tf.Session
) – A session that will be used to compute the values.
Returns: list
oflist
– A list of lists of actions. The shape equals the shape of observations.- observations – The observations that will be passed to the
-
terminals_placeholder
¶ tf.Tensor
– The placeholder for the sampled terminals (booleans).
-