actorcritic.policies

Contains policies that determine the behavior of an agent.

Classes

DistributionPolicy(distribution, actions[, …]) Base class for stochastic policies that follow a concrete tf.distributions.Distribution.
Policy Base class for stochastic policies.
SoftmaxPolicy(logits, actions[, …]) A stochastic policy that follows a categorical distribution.
class actorcritic.policies.DistributionPolicy(distribution, actions, random_seed=None)[source]

Bases: actorcritic.policies.Policy

Base class for stochastic policies that follow a concrete tf.distributions.Distribution. Implements the required methods based on this distribution.

__init__(distribution, actions, random_seed=None)[source]
Parameters:
  • distribution (tf.distributions.Distribution) – The distribution.
  • actions (tf.Tensor) – The input actions used to compute the log-probabilities. Must have the same shape as the inputs.
  • random_seed (int, optional) – A random seed used for sampling.
entropy

tf.Tensor – Computes the entropy of this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

log_prob

tf.Tensor – Computes the log-probability of the given actions based on the inputs that are provided for computing the probabilities. The shape equals the shape of the actions and the inputs.

mode

tf.Tensor – Selects actions from this policy which have the highest probability (mode) based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

sample

tf.Tensor – Samples actions from this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

class actorcritic.policies.Policy[source]

Bases: object

Base class for stochastic policies.

entropy

tf.Tensor – Computes the entropy of this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

log_prob

tf.Tensor – Computes the log-probability of the given actions based on the inputs that are provided for computing the probabilities. The shape equals the shape of the actions and the inputs.

mode

tf.Tensor – Selects actions from this policy which have the highest probability (mode) based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

register_predictive_distribution(layer_collection, random_seed=None)[source]

Registers the predictive distribution of this policy in the specified kfac.LayerCollection (required for K-FAC).

Parameters:
  • layer_collection (kfac.LayerCollection) – A layer collection used by the KfacOptimizer.
  • random_seed (int, optional) – A random seed for sampling from the predictive distribution.
Raises:

NotImplementedError – If this policy does not support K-FAC.

sample

tf.Tensor – Samples actions from this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.

class actorcritic.policies.SoftmaxPolicy(logits, actions, random_seed=None, name=None)[source]

Bases: actorcritic.policies.DistributionPolicy

A stochastic policy that follows a categorical distribution.

__init__(logits, actions, random_seed=None, name=None)[source]
Parameters:
  • logits (tf.Tensor) – The input logits (or ‘scores’) used to compute the probabilities.
  • actions (tf.Tensor) – The input actions used to compute the log-probabilities. Must have the same shape as logits.
  • random_seed (int, optional) – A random seed used for sampling.
  • name (string, optional) – A name for this policy.
register_predictive_distribution(layer_collection, random_seed=None)[source]

Registers the predictive distribution of this policy in the specified kfac.LayerCollection (required for K-FAC).

Parameters:
  • layer_collection (kfac.LayerCollection) – A layer collection used by the KfacOptimizer.
  • random_seed (int, optional) – A random seed for sampling from the predictive distribution.