actorcritic.policies¶

Contains policies that determine the behavior of an agent.

Classes

`DistributionPolicy`(distribution, actions[, …])	Base class for stochastic policies that follow a concrete `tf.distributions.Distribution`.
`Policy`	Base class for stochastic policies.
`SoftmaxPolicy`(logits, actions[, …])	A stochastic policy that follows a categorical distribution.

class actorcritic.policies.DistributionPolicy(distribution, actions, random_seed=None)[source]¶

Bases: actorcritic.policies.Policy

Base class for stochastic policies that follow a concrete tf.distributions.Distribution. Implements the required methods based on this distribution.

__init__(distribution, actions, random_seed=None)[source]¶

Parameters:	distribution (`tf.distributions.Distribution`) – The distribution. actions (`tf.Tensor`) – The input actions used to compute the log-probabilities. Must have the same shape as the inputs. random_seed (`int`, optional) – A random seed used for sampling.

entropy¶: tf.Tensor – Computes the entropy of this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

log_prob¶: tf.Tensor – Computes the log-probability of the given actions based on the inputs that are provided for computing the probabilities. The shape equals the shape of the actions and the inputs.

mode¶: tf.Tensor – Selects actions from this policy which have the highest probability (mode) based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

sample¶: tf.Tensor – Samples actions from this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

class actorcritic.policies.Policy[source]¶

Bases: object

Base class for stochastic policies.

entropy¶: tf.Tensor – Computes the entropy of this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

log_prob¶: tf.Tensor – Computes the log-probability of the given actions based on the inputs that are provided for computing the probabilities. The shape equals the shape of the actions and the inputs.

mode¶: tf.Tensor – Selects actions from this policy which have the highest probability (mode) based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

register_predictive_distribution(layer_collection, random_seed=None)[source]¶

Registers the predictive distribution of this policy in the specified kfac.LayerCollection (required for K-FAC).

Parameters:	layer_collection (`kfac.LayerCollection`) – A layer collection used by the `KfacOptimizer`. random_seed (`int`, optional) – A random seed for sampling from the predictive distribution.
Raises:	`NotImplementedError` – If this policy does not support K-FAC.

sample¶: tf.Tensor – Samples actions from this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

class actorcritic.policies.SoftmaxPolicy(logits, actions, random_seed=None, name=None)[source]¶

Bases: actorcritic.policies.DistributionPolicy

A stochastic policy that follows a categorical distribution.

__init__(logits, actions, random_seed=None, name=None)[source]¶

Parameters:	logits (`tf.Tensor`) – The input logits (or ‘scores’) used to compute the probabilities. actions (`tf.Tensor`) – The input actions used to compute the log-probabilities. Must have the same shape as logits. random_seed (`int`, optional) – A random seed used for sampling. name (`string`, optional) – A name for this policy.

register_predictive_distribution(layer_collection, random_seed=None)[source]¶

Registers the predictive distribution of this policy in the specified kfac.LayerCollection (required for K-FAC).

Parameters:	layer_collection (`kfac.LayerCollection`) – A layer collection used by the `KfacOptimizer`. random_seed (`int`, optional) – A random seed for sampling from the predictive distribution.