actorcritic.policies¶
Contains policies that determine the behavior of an agent.
Classes
DistributionPolicy(distribution, actions[, …]) |
Base class for stochastic policies that follow a concrete tf.distributions.Distribution. |
Policy |
Base class for stochastic policies. |
SoftmaxPolicy(logits, actions[, …]) |
A stochastic policy that follows a categorical distribution. |
-
class
actorcritic.policies.DistributionPolicy(distribution, actions, random_seed=None)[source]¶ Bases:
actorcritic.policies.PolicyBase class for stochastic policies that follow a concrete
tf.distributions.Distribution. Implements the required methods based on this distribution.-
__init__(distribution, actions, random_seed=None)[source]¶ Parameters: - distribution (
tf.distributions.Distribution) – The distribution. - actions (
tf.Tensor) – The input actions used to compute the log-probabilities. Must have the same shape as the inputs. - random_seed (
int, optional) – A random seed used for sampling.
- distribution (
-
entropy¶ tf.Tensor– Computes the entropy of this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
log_prob¶ tf.Tensor– Computes the log-probability of the given actions based on the inputs that are provided for computing the probabilities. The shape equals the shape of the actions and the inputs.
-
mode¶ tf.Tensor– Selects actions from this policy which have the highest probability (mode) based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
sample¶ tf.Tensor– Samples actions from this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
-
class
actorcritic.policies.Policy[source]¶ Bases:
objectBase class for stochastic policies.
-
entropy¶ tf.Tensor– Computes the entropy of this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
log_prob¶ tf.Tensor– Computes the log-probability of the given actions based on the inputs that are provided for computing the probabilities. The shape equals the shape of the actions and the inputs.
-
mode¶ tf.Tensor– Selects actions from this policy which have the highest probability (mode) based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
register_predictive_distribution(layer_collection, random_seed=None)[source]¶ Registers the predictive distribution of this policy in the specified
kfac.LayerCollection(required for K-FAC).Parameters: - layer_collection (
kfac.LayerCollection) – A layer collection used by theKfacOptimizer. - random_seed (
int, optional) – A random seed for sampling from the predictive distribution.
Raises: NotImplementedError– If this policy does not support K-FAC.- layer_collection (
-
sample¶ tf.Tensor– Samples actions from this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
-
class
actorcritic.policies.SoftmaxPolicy(logits, actions, random_seed=None, name=None)[source]¶ Bases:
actorcritic.policies.DistributionPolicyA stochastic policy that follows a categorical distribution.
-
__init__(logits, actions, random_seed=None, name=None)[source]¶ Parameters: - logits (
tf.Tensor) – The input logits (or ‘scores’) used to compute the probabilities. - actions (
tf.Tensor) – The input actions used to compute the log-probabilities. Must have the same shape as logits. - random_seed (
int, optional) – A random seed used for sampling. - name (
string, optional) – A name for this policy.
- logits (
-
register_predictive_distribution(layer_collection, random_seed=None)[source]¶ Registers the predictive distribution of this policy in the specified
kfac.LayerCollection(required for K-FAC).Parameters: - layer_collection (
kfac.LayerCollection) – A layer collection used by theKfacOptimizer. - random_seed (
int, optional) – A random seed for sampling from the predictive distribution.
- layer_collection (
-