actorcritic.baselines¶

Contains baselines, which are used to compute the advantage.

Classes

`Baseline`	A wrapper class for the baseline that is subtracted from the target values to get the advantage.
`StateValueFunction`(value)	A baseline defined by a state-value function.

class actorcritic.baselines.Baseline[source]¶

Bases: object

A wrapper class for the baseline that is subtracted from the target values to get the advantage.

register_predictive_distribution(layer_collection, random_seed=None)[source]¶

Registers the predictive distribution of this baseline in the specified kfac.LayerCollection (required for K-FAC).

Parameters:	layer_collection (`kfac.LayerCollection`) – A layer collection used by the `KfacOptimizer`. random_seed (`int`, optional) – A random seed for sampling from the predictive distribution.
Raises:	`NotImplementedError` – If this baseline does not support K-FAC.

value¶: tf.Tensor – The output values of this baseline.

class actorcritic.baselines.StateValueFunction(value)[source]¶

Bases: actorcritic.baselines.Baseline

A baseline defined by a state-value function.

__init__(value)[source]¶

Parameters:	value (`tf.Tensor`) – The output values of this state-value function.

register_predictive_distribution(layer_collection, random_seed=None)[source]¶

Registers the predictive distribution (normal distribution) of this state-value function in the specified kfac.LayerCollection (required for K-FAC).

Parameters:	layer_collection (`kfac.LayerCollection`) – A layer collection used by the `KfacOptimizer`. random_seed (`int`, optional) – A random seed for sampling from the predictive distribution.

value¶: tf.Tensor – The output values of this state-value function.