actorcritic.baselines¶
Contains baselines, which are used to compute the advantage.
Classes
Baseline |
A wrapper class for the baseline that is subtracted from the target values to get the advantage. |
StateValueFunction(value) |
A baseline defined by a state-value function. |
-
class
actorcritic.baselines.Baseline[source]¶ Bases:
objectA wrapper class for the baseline that is subtracted from the target values to get the advantage.
-
register_predictive_distribution(layer_collection, random_seed=None)[source]¶ Registers the predictive distribution of this baseline in the specified
kfac.LayerCollection(required for K-FAC).Parameters: - layer_collection (
kfac.LayerCollection) – A layer collection used by theKfacOptimizer. - random_seed (
int, optional) – A random seed for sampling from the predictive distribution.
Raises: NotImplementedError– If this baseline does not support K-FAC.- layer_collection (
-
value¶ tf.Tensor– The output values of this baseline.
-
-
class
actorcritic.baselines.StateValueFunction(value)[source]¶ Bases:
actorcritic.baselines.BaselineA baseline defined by a state-value function.
-
__init__(value)[source]¶ Parameters: value ( tf.Tensor) – The output values of this state-value function.
-
register_predictive_distribution(layer_collection, random_seed=None)[source]¶ Registers the predictive distribution (normal distribution) of this state-value function in the specified
kfac.LayerCollection(required for K-FAC).Parameters: - layer_collection (
kfac.LayerCollection) – A layer collection used by theKfacOptimizer. - random_seed (
int, optional) – A random seed for sampling from the predictive distribution.
- layer_collection (
-
value¶ tf.Tensor– The output values of this state-value function.
-