actorcritic.policies¶
Contains policies that determine the behavior of an agent.
Classes
DistributionPolicy (distribution, actions[, …]) |
Base class for stochastic policies that follow a concrete tf.distributions.Distribution . |
Policy |
Base class for stochastic policies. |
SoftmaxPolicy (logits, actions[, …]) |
A stochastic policy that follows a categorical distribution. |
-
class
actorcritic.policies.
DistributionPolicy
(distribution, actions, random_seed=None)[source]¶ Bases:
actorcritic.policies.Policy
Base class for stochastic policies that follow a concrete
tf.distributions.Distribution
. Implements the required methods based on this distribution.-
__init__
(distribution, actions, random_seed=None)[source]¶ Parameters: - distribution (
tf.distributions.Distribution
) – The distribution. - actions (
tf.Tensor
) – The input actions used to compute the log-probabilities. Must have the same shape as the inputs. - random_seed (
int
, optional) – A random seed used for sampling.
- distribution (
-
entropy
¶ tf.Tensor
– Computes the entropy of this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
log_prob
¶ tf.Tensor
– Computes the log-probability of the given actions based on the inputs that are provided for computing the probabilities. The shape equals the shape of the actions and the inputs.
-
mode
¶ tf.Tensor
– Selects actions from this policy which have the highest probability (mode) based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
sample
¶ tf.Tensor
– Samples actions from this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
-
class
actorcritic.policies.
Policy
[source]¶ Bases:
object
Base class for stochastic policies.
-
entropy
¶ tf.Tensor
– Computes the entropy of this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
log_prob
¶ tf.Tensor
– Computes the log-probability of the given actions based on the inputs that are provided for computing the probabilities. The shape equals the shape of the actions and the inputs.
-
mode
¶ tf.Tensor
– Selects actions from this policy which have the highest probability (mode) based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
register_predictive_distribution
(layer_collection, random_seed=None)[source]¶ Registers the predictive distribution of this policy in the specified
kfac.LayerCollection
(required for K-FAC).Parameters: - layer_collection (
kfac.LayerCollection
) – A layer collection used by theKfacOptimizer
. - random_seed (
int
, optional) – A random seed for sampling from the predictive distribution.
Raises: NotImplementedError
– If this policy does not support K-FAC.- layer_collection (
-
sample
¶ tf.Tensor
– Samples actions from this policy based on the inputs that are provided for computing the probabilities. The shape equals the shape of the inputs.
-
-
class
actorcritic.policies.
SoftmaxPolicy
(logits, actions, random_seed=None, name=None)[source]¶ Bases:
actorcritic.policies.DistributionPolicy
A stochastic policy that follows a categorical distribution.
-
__init__
(logits, actions, random_seed=None, name=None)[source]¶ Parameters: - logits (
tf.Tensor
) – The input logits (or ‘scores’) used to compute the probabilities. - actions (
tf.Tensor
) – The input actions used to compute the log-probabilities. Must have the same shape as logits. - random_seed (
int
, optional) – A random seed used for sampling. - name (
string
, optional) – A name for this policy.
- logits (
-
register_predictive_distribution
(layer_collection, random_seed=None)[source]¶ Registers the predictive distribution of this policy in the specified
kfac.LayerCollection
(required for K-FAC).Parameters: - layer_collection (
kfac.LayerCollection
) – A layer collection used by theKfacOptimizer
. - random_seed (
int
, optional) – A random seed for sampling from the predictive distribution.
- layer_collection (
-