actorcritic.baselines

Contains baselines, which are used to compute the advantage.

Classes

Baseline A wrapper class for the baseline that is subtracted from the target values to get the advantage.
StateValueFunction(value) A baseline defined by a state-value function.
class actorcritic.baselines.Baseline[source]

Bases: object

A wrapper class for the baseline that is subtracted from the target values to get the advantage.

register_predictive_distribution(layer_collection, random_seed=None)[source]

Registers the predictive distribution of this baseline in the specified kfac.LayerCollection (required for K-FAC).

Parameters:
  • layer_collection (kfac.LayerCollection) – A layer collection used by the KfacOptimizer.
  • random_seed (int, optional) – A random seed for sampling from the predictive distribution.
Raises:

NotImplementedError – If this baseline does not support K-FAC.

value

tf.Tensor – The output values of this baseline.

class actorcritic.baselines.StateValueFunction(value)[source]

Bases: actorcritic.baselines.Baseline

A baseline defined by a state-value function.

__init__(value)[source]
Parameters:value (tf.Tensor) – The output values of this state-value function.
register_predictive_distribution(layer_collection, random_seed=None)[source]

Registers the predictive distribution (normal distribution) of this state-value function in the specified kfac.LayerCollection (required for K-FAC).

Parameters:
  • layer_collection (kfac.LayerCollection) – A layer collection used by the KfacOptimizer.
  • random_seed (int, optional) – A random seed for sampling from the predictive distribution.
value

tf.Tensor – The output values of this state-value function.