actorcritic.baselines¶
Contains baselines, which are used to compute the advantage.
Classes
Baseline |
A wrapper class for the baseline that is subtracted from the target values to get the advantage. |
StateValueFunction (value) |
A baseline defined by a state-value function. |
-
class
actorcritic.baselines.
Baseline
[source]¶ Bases:
object
A wrapper class for the baseline that is subtracted from the target values to get the advantage.
-
register_predictive_distribution
(layer_collection, random_seed=None)[source]¶ Registers the predictive distribution of this baseline in the specified
kfac.LayerCollection
(required for K-FAC).Parameters: - layer_collection (
kfac.LayerCollection
) – A layer collection used by theKfacOptimizer
. - random_seed (
int
, optional) – A random seed for sampling from the predictive distribution.
Raises: NotImplementedError
– If this baseline does not support K-FAC.- layer_collection (
-
value
¶ tf.Tensor
– The output values of this baseline.
-
-
class
actorcritic.baselines.
StateValueFunction
(value)[source]¶ Bases:
actorcritic.baselines.Baseline
A baseline defined by a state-value function.
-
__init__
(value)[source]¶ Parameters: value ( tf.Tensor
) – The output values of this state-value function.
-
register_predictive_distribution
(layer_collection, random_seed=None)[source]¶ Registers the predictive distribution (normal distribution) of this state-value function in the specified
kfac.LayerCollection
(required for K-FAC).Parameters: - layer_collection (
kfac.LayerCollection
) – A layer collection used by theKfacOptimizer
. - random_seed (
int
, optional) – A random seed for sampling from the predictive distribution.
- layer_collection (
-
value
¶ tf.Tensor
– The output values of this state-value function.
-