Multivariate Gaussian model

Multivariate Gaussian models run continuous-domain stochastic policies.

skrl provides a Python mixin (MultivariateGaussianMixin) to assist in the creation of these types of models, allowing users to have full control over the function approximator definitions and architectures. Note that the use of this mixin must comply with the following rules:

  • The definition of multiple inheritance must always include the Model base class at the end.

    class MultivariateGaussianModel(MultivariateGaussianMixin, Model):
        def __init__(self, observation_space, action_space, device="cuda:0",
                     clip_actions=False, clip_log_std=True, min_log_std=-20, max_log_std=2):
            Model.__init__(self, observation_space, action_space, device)
            MultivariateGaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std)
    
  • The Model base class constructor must be invoked before the mixins constructor.

    class MultivariateGaussianModel(MultivariateGaussianMixin, Model):
        def __init__(self, observation_space, action_space, device="cuda:0",
                     clip_actions=False, clip_log_std=True, min_log_std=-20, max_log_std=2):
            Model.__init__(self, observation_space, action_space, device)
            MultivariateGaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std)
    

Concept

Multivariate Gaussian model

Basic usage

  • Multi-Layer Perceptron (MLP)

  • Convolutional Neural Network (CNN)

  • Recurrent Neural Network (RNN)

  • Gated Recurrent Unit RNN (GRU)

  • Long Short-Term Memory RNN (LSTM)

../_images/model_gaussian_mlp.svg
 1import torch
 2import torch.nn as nn
 3
 4from skrl.models.torch import Model, MultivariateGaussianMixin
 5
 6
 7# define the model
 8class MLP(MultivariateGaussianMixin, Model):
 9    def __init__(self, observation_space, action_space, device,
10                 clip_actions=False, clip_log_std=True, min_log_std=-20, max_log_std=2):
11        Model.__init__(self, observation_space, action_space, device)
12        MultivariateGaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std)
13
14        self.net = nn.Sequential(nn.Linear(self.num_observations, 64),
15                                 nn.ReLU(),
16                                 nn.Linear(64, 32),
17                                 nn.ReLU(),
18                                 nn.Linear(32, self.num_actions),
19                                 nn.Tanh())
20
21        self.log_std_parameter = nn.Parameter(torch.zeros(self.num_actions))
22
23    def compute(self, inputs, role):
24        return self.net(inputs["states"]), self.log_std_parameter, {}
25
26
27# instantiate the model (assumes there is a wrapped environment: env)
28policy = MLP(observation_space=env.observation_space,
29             action_space=env.action_space,
30             device=env.device,
31             clip_actions=True,
32             clip_log_std=True,
33             min_log_std=-20,
34             max_log_std=2)

API

class skrl.models.torch.multivariate_gaussian.MultivariateGaussianMixin(clip_actions: bool = False, clip_log_std: bool = True, min_log_std: float = - 20, max_log_std: float = 2, role: str = '')

Bases: object

__init__(clip_actions: bool = False, clip_log_std: bool = True, min_log_std: float = - 20, max_log_std: float = 2, role: str = '') None

Multivariate Gaussian mixin model (stochastic model)

Parameters
  • clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped to the action space (default: False)

  • clip_log_std (bool, optional) – Flag to indicate whether the log standard deviations should be clipped (default: True)

  • min_log_std (float, optional) – Minimum value of the log standard deviation if clip_log_std is True (default: -20)

  • max_log_std (float, optional) – Maximum value of the log standard deviation if clip_log_std is True (default: 2)

  • role (str, optional) – Role play by the model (default: "")

Example:

# define the model
>>> import torch
>>> import torch.nn as nn
>>> from skrl.models.torch import Model, MultivariateGaussianMixin
>>>
>>> class Policy(MultivariateGaussianMixin, Model):
...     def __init__(self, observation_space, action_space, device="cuda:0",
...                  clip_actions=False, clip_log_std=True, min_log_std=-20, max_log_std=2):
...         Model.__init__(self, observation_space, action_space, device)
...         MultivariateGaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std)
...
...         self.net = nn.Sequential(nn.Linear(self.num_observations, 32),
...                                  nn.ELU(),
...                                  nn.Linear(32, 32),
...                                  nn.ELU(),
...                                  nn.Linear(32, self.num_actions))
...         self.log_std_parameter = nn.Parameter(torch.zeros(self.num_actions))
...
...     def compute(self, inputs, role):
...         return self.net(inputs["states"]), self.log_std_parameter, {}
...
>>> # given an observation_space: gym.spaces.Box with shape (60,)
>>> # and an action_space: gym.spaces.Box with shape (8,)
>>> model = Policy(observation_space, action_space)
>>>
>>> print(model)
Policy(
  (net): Sequential(
    (0): Linear(in_features=60, out_features=32, bias=True)
    (1): ELU(alpha=1.0)
    (2): Linear(in_features=32, out_features=32, bias=True)
    (3): ELU(alpha=1.0)
    (4): Linear(in_features=32, out_features=8, bias=True)
  )
)
act(inputs: Mapping[str, Union[torch.Tensor, Any]], role: str = '') Tuple[torch.Tensor, Optional[torch.Tensor], Mapping[str, Union[torch.Tensor, Any]]]

Act stochastically in response to the state of the environment

Parameters
  • inputs (dict where the values are typically torch.Tensor) –

    Model inputs. The most common keys are:

    • "states": state of the environment used to make the decision

    • "taken_actions": actions taken by the policy for the given states

  • role (str, optional) – Role play by the model (default: "")

Returns

Model output. The first component is the action to be taken by the agent. The second component is the log of the probability density function. The third component is a dictionary containing the mean actions "mean_actions" and extra output values

Return type

tuple of torch.Tensor, torch.Tensor or None, and dictionary

Example:

>>> # given a batch of sample states with shape (4096, 60)
>>> actions, log_prob, outputs = model.act({"states": states})
>>> print(actions.shape, log_prob.shape, outputs["mean_actions"].shape)
torch.Size([4096, 8]) torch.Size([4096, 1]) torch.Size([4096, 8])
distribution(role: str = '') torch.distributions.multivariate_normal.MultivariateNormal

Get the current distribution of the model

Returns

Distribution of the model

Return type

torch.distributions.MultivariateNormal

Parameters

role (str, optional) – Role play by the model (default: "")

Example:

>>> distribution = model.distribution()
>>> print(distribution)
MultivariateNormal(loc: torch.Size([4096, 8]), scale_tril: torch.Size([4096, 8, 8]))
get_entropy(role: str = '') torch.Tensor

Compute and return the entropy of the model

Returns

Entropy of the model

Return type

torch.Tensor

Parameters

role (str, optional) – Role play by the model (default: "")

Example:

>>> entropy = model.get_entropy()
>>> print(entropy.shape)
torch.Size([4096])
get_log_std(role: str = '') torch.Tensor

Return the log standard deviation of the model

Returns

Log standard deviation of the model

Return type

torch.Tensor

Parameters

role (str, optional) – Role play by the model (default: "")

Example:

>>> log_std = model.get_log_std()
>>> print(log_std.shape)
torch.Size([4096, 8])