Model instantiators

Utilities for quickly creating model instances.



Models

    pytorch    

    jax    

Tabular model (discrete domain)

\(\square\)

\(\square\)

Categorical model (discrete domain)

\(\blacksquare\)

\(\blacksquare\)

Multi-Categorical model (discrete domain)

\(\blacksquare\)

\(\blacksquare\)

Gaussian model (continuous domain)

\(\blacksquare\)

\(\blacksquare\)

Multivariate Gaussian model (continuous domain)

\(\blacksquare\)

\(\square\)

Deterministic model (continuous domain)

\(\blacksquare\)

\(\blacksquare\)

Shared model

\(\blacksquare\)

\(\square\)


Network definitions

The network is composed of one or more containers. For each container its input, hidden layers and activation functions are specified.

Implementation details:

  • The network compute/forward is done by calling the containers in the order in which they are defined

  • Containers use torch.nn.Sequential in PyTorch, and flax.linen.Sequential in JAX

  • If a single activation function is specified (mapping or sequence), it will be applied after each layer (except flatten layers) in the container

network:
  - name: <NAME>  # container name
    input: <INPUT>  # container input (certain operations are supported)
    layers:  # list of supported layers
      - <LAYER 1>
      - ...
      - <LAYER N>
    activations:  # list of supported activation functions
      - <ACTIVATION 1>
      - ...
      - <ACTIVATION N>

Inputs

Inputs can be specified using tokens or previously defined container outputs (by container name). Certain operations could be specified on them, including indexing and slicing

Hint

Operations can be mixed to create complex input statements

Available tokens:

  • OBSERVATIONS: Token indicating the input states (inputs["states"]) forwarded to the model

  • ACTIONS: Token indicating the input actions (inputs["taken_actions"]) forwarded to the model

  • OBSERVATIONS_ACTIONS: Token indicating the concatenation of the forwarded input states and actions

  • OBSERVATION_SPACE: Token indicating the observation_space of the model

  • ACTION_SPACE: Token indicating the action_space of the model

  • STATES: Alias for OBSERVATIONS (this is to change in future versions to distinguish between observation and state spaces)

  • STATES_ACTIONS: Alias for OBSERVATIONS_ACTIONS (this is to change in future versions to distinguish between observation and state spaces)

Supported operations:

Operations

Example

Tensor/array indexing and slicing.
E.g.: Box space

OBSERVATIONS[:, 0]
OBSERVATIONS[:, 2:5]

Dictionary indexing by key.
E.g.: Dict space

STATES["joint-pos"]

Arithmetic (+, -, *, /)

features_extractor + ACTIONS

Concatenation

concatenate([features_extractor, ACTIONS])

Permute dimensions

permute(OBSERVATIONS, (0, 3, 1, 2))

One-hot encoding Discrete
and MultiDiscrete spaces

one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)


Output

The output can be specified using tokens or defined container outputs (by container name). Certain operations could be specified on it

Note

If a token is used, a linear layer will be created with the last container in the list (as the number of input features) and the value represented by the token (as the number of output features)

Hint

Operations can be mixed to create complex output statement

Available tokens:

  • ACTIONS: Token indicating that the output shape is the number of elements in the action space

  • ONE: Token indicating that the output shape is 1

Supported operations:

Operations

Example

Activation function

tanh(ACTIONS)

Arithmetic (+, -, *, /)

features_extractor + ONE

Concatenation

concatenate([features_extractor, net])


Activation functions

The following table lists the supported activation functions:


Layers

The following table lists the supported layers and transformations:

Layers

    pytorch    

    jax    

linear

torch.nn.Linear

flax.linen.Dense

conv2d

torch.nn.Conv2d

flax.linen.Conv

flatten

torch.nn.Flatten

jax.numpy.reshape


linear

Apply a linear transformation (torch.nn.Linear in PyTorch, flax.linen.Dense in JAX)

Note

The tokens STATES (number of elements in the observation/state space), ACTIONS (number of elements in the action space), STATES_ACTIONS (the sum of the number of elements of the observation/state space and of the action space) and ONE (1) can be used as the layer’s number of input/output features

Note

If the PyTorch’s in_features parameter is not specified it will be inferred by using the torch.nn.LazyLinear module

    pytorch    

    jax    

Type

Required

Description

in_features

-

int

\(\square\)

Number of input features

0

out_features

features

int

\(\blacksquare\)

Number of output features

1

bias

use_bias

bool

\(\square\)

Whether to add a bias

layers:
  - 32

conv2d

Apply a 2D convolution (torch.nn.Conv2d in PyTorch, flax.linen.Conv in JAX)

Warning

  • PyTorch torch.nn.Conv2d expects the input to be in the form NCHW (N: batch, C: channels, H: height, W: width). A permutation operation may be necessary to modify the dimensions of a batch of images which are typically NHWC.

  • JAX flax.linen.Conv expects the input to be in the form NHWC (the typical dimensions of a batch of images).

Note

If the PyTorch’s in_channels parameter is not specified it will be inferred by using the torch.nn.LazyConv2d module

    pytorch    

    jax    

Type

Required

Description

in_channels

-

int

\(\square\)

Number of input channels

0

out_channels

features

int

\(\blacksquare\)

Number of output channels (filters)

1

kernel_size

kernel_size

int, tuple[int]

\(\blacksquare\)

Convolutional kernel size

2

stride

strides

int, tuple[int]

\(\square\)

Inter-window strides

3

padding

padding

str, int, tuple[int]

\(\square\)

Padding added to all dimensions

4

bias

use_bias

bool

\(\square\)

Whether to add a bias

layers:
  - conv2d: [32, 8, [4, 4]]

flatten

Flatten a contiguous range of dimensions (torch.nn.Flatten in PyTorch, jax.numpy.reshape operation in JAX)

    pytorch    

    jax    

Type

Required

Description

0

start_dim

-

int

\(\square\)

First dimension to flatten

1

end_dim

-

int

\(\square\)

Last dimension to flatten

layers:
  - flatten

API (PyTorch)

skrl.utils.model_instantiators.torch.categorical_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, unnormalized_log_prob: bool = True, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str

Instantiate a categorical model

Parameters:
  • observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space

  • action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space

  • device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

  • unnormalized_log_prob (bool, optional) – Flag to indicate how to be interpreted the model’s output (default: True). If True, the model’s output is interpreted as unnormalized log probabilities (it can be any real number), otherwise as normalized probabilities (the output must be non-negative, finite and have a non-zero sum)

  • network (list of dict, optional) – Network definition (default: [])

  • output (list or str, optional) – Output expression (default: “”)

  • return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).

Returns:

Categorical model instance or definition source

Return type:

Model

skrl.utils.model_instantiators.torch.multicategorical_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, unnormalized_log_prob: bool = True, reduction: str = 'sum', network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str

Instantiate a multi-categorical model

Parameters:
  • observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space

  • action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space

  • device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

  • unnormalized_log_prob (bool, optional) – Flag to indicate how to be interpreted the model’s output (default: True). If True, the model’s output is interpreted as unnormalized log probabilities (it can be any real number), otherwise as normalized probabilities (the output must be non-negative, finite and have a non-zero sum)

  • reduction (str, optional) – Reduction method for returning the log probability density function: (default: "sum"). Supported values are "mean", "sum", "prod" and "none". If “none", the log probability density function is returned as a tensor of shape (num_samples, num_actions) instead of (num_samples, 1)

  • network (list of dict, optional) – Network definition (default: [])

  • output (list or str, optional) – Output expression (default: “”)

  • return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).

Returns:

Multi-Categorical model instance or definition source

Return type:

Model

skrl.utils.model_instantiators.torch.deterministic_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, clip_actions: bool = False, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str

Instantiate a deterministic model

Parameters:
  • observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space

  • action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space

  • device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

  • clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped (default: False)

  • network (list of dict, optional) – Network definition (default: [])

  • output (list or str, optional) – Output expression (default: “”)

  • return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).

Returns:

Deterministic model instance or definition source

Return type:

Model

skrl.utils.model_instantiators.torch.gaussian_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, clip_actions: bool = False, clip_log_std: bool = True, min_log_std: float = -20, max_log_std: float = 2, reduction: str = 'sum', initial_log_std: float = 0, fixed_log_std: bool = False, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str

Instantiate a Gaussian model

Parameters:
  • observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space

  • action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space

  • device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

  • clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped (default: False)

  • clip_log_std (bool, optional) – Flag to indicate whether the log standard deviations should be clipped (default: True)

  • min_log_std (float, optional) – Minimum value of the log standard deviation (default: -20)

  • max_log_std (float, optional) – Maximum value of the log standard deviation (default: 2)

  • reduction (str, optional) – Reduction method for returning the log probability density function: (default: "sum"). Supported values are "mean", "sum", "prod" and "none". If “none", the log probability density function is returned as a tensor of shape (num_samples, num_actions) instead of (num_samples, 1)

  • initial_log_std (float, optional) – Initial value for the log standard deviation (default: 0)

  • fixed_log_std (bool, optional) – Whether the log standard deviation parameter should be fixed (default: False). Fixed parameters have the gradient computation deactivated

  • network (list of dict, optional) – Network definition (default: [])

  • output (list or str, optional) – Output expression (default: “”)

  • return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).

Returns:

Gaussian model instance or definition source

Return type:

Model

skrl.utils.model_instantiators.torch.multivariate_gaussian_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, clip_actions: bool = False, clip_log_std: bool = True, min_log_std: float = -20, max_log_std: float = 2, initial_log_std: float = 0, fixed_log_std: bool = False, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str

Instantiate a multivariate Gaussian model

Parameters:
  • observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space

  • action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space

  • device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

  • clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped (default: False)

  • clip_log_std (bool, optional) – Flag to indicate whether the log standard deviations should be clipped (default: True)

  • min_log_std (float, optional) – Minimum value of the log standard deviation (default: -20)

  • max_log_std (float, optional) – Maximum value of the log standard deviation (default: 2)

  • initial_log_std (float, optional) – Initial value for the log standard deviation (default: 0)

  • fixed_log_std (bool, optional) – Whether the log standard deviation parameter should be fixed (default: False). Fixed parameters have the gradient computation deactivated

  • network (list of dict, optional) – Network definition (default: [])

  • output (list or str, optional) – Output expression (default: “”)

  • return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).

Returns:

Multivariate Gaussian model instance or definition source

Return type:

Model

skrl.utils.model_instantiators.torch.shared_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, structure: Sequence[str] = ['GaussianMixin', 'DeterministicMixin'], roles: Sequence[str] = [], parameters: Sequence[Mapping[str, Any]] = [], single_forward_pass: bool = True, return_source: bool = False) Model | str

Instantiate a shared model

Parameters:
  • observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space

  • action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space

  • device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

  • structure (sequence of strings, optional) – Shared model structure (default: Gaussian-Deterministic).

  • roles (sequence of strings, optional) – Organized list of model roles (default: [])

  • parameters (sequence of dict, optional) – Organized list of model instantiator parameters (default: [])

  • single_forward_pass (bool) – Whether to perform a single forward-pass for the shared layers/network (default: True)

  • return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).

Returns:

Shared model instance or definition source

Return type:

Model


API (JAX)

skrl.utils.model_instantiators.jax.categorical_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | jax.Device | None = None, unnormalized_log_prob: bool = True, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str

Instantiate a categorical model

Parameters:
  • observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space

  • action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space

  • device (str or jax.Device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

  • unnormalized_log_prob (bool, optional) – Flag to indicate how to be interpreted the model’s output (default: True). If True, the model’s output is interpreted as unnormalized log probabilities (it can be any real number), otherwise as normalized probabilities (the output must be non-negative, finite and have a non-zero sum)

  • network (list of dict, optional) – Network definition (default: [])

  • output (list or str, optional) – Output expression (default: “”)

  • return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).

Returns:

Categorical model instance or definition source

Return type:

Model

skrl.utils.model_instantiators.jax.multicategorical_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | jax.Device | None = None, unnormalized_log_prob: bool = True, reduction: str = 'sum', network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str

Instantiate a multi-categorical model

Parameters:
  • observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space

  • action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space

  • device (str or jax.Device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

  • unnormalized_log_prob (bool, optional) – Flag to indicate how to be interpreted the model’s output (default: True). If True, the model’s output is interpreted as unnormalized log probabilities (it can be any real number), otherwise as normalized probabilities (the output must be non-negative, finite and have a non-zero sum)

  • reduction (str, optional) – Reduction method for returning the log probability density function: (default: "sum"). Supported values are "mean", "sum", "prod" and "none". If “none", the log probability density function is returned as a tensor of shape (num_samples, num_actions) instead of (num_samples, 1)

  • network (list of dict, optional) – Network definition (default: [])

  • output (list or str, optional) – Output expression (default: “”)

  • return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).

Returns:

Multi-Categorical model instance or definition source

Return type:

Model

skrl.utils.model_instantiators.jax.deterministic_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | jax.Device | None = None, clip_actions: bool = False, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str

Instantiate a deterministic model

Parameters:
  • observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space

  • action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space

  • device (str or jax.Device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

  • clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped (default: False)

  • network (list of dict, optional) – Network definition (default: [])

  • output (list or str, optional) – Output expression (default: “”)

  • return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).

Returns:

Deterministic model instance or definition source

Return type:

Model

skrl.utils.model_instantiators.jax.gaussian_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | jax.Device | None = None, clip_actions: bool = False, clip_log_std: bool = True, min_log_std: float = -20, max_log_std: float = 2, reduction: str = 'sum', initial_log_std: float = 0, fixed_log_std: bool = False, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str

Instantiate a Gaussian model

Parameters:
  • observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space

  • action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space

  • device (str or jax.Device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

  • clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped (default: False)

  • clip_log_std (bool, optional) – Flag to indicate whether the log standard deviations should be clipped (default: True)

  • min_log_std (float, optional) – Minimum value of the log standard deviation (default: -20)

  • max_log_std (float, optional) – Maximum value of the log standard deviation (default: 2)

  • reduction (str, optional) – Reduction method for returning the log probability density function: (default: "sum"). Supported values are "mean", "sum", "prod" and "none". If “none", the log probability density function is returned as a tensor of shape (num_samples, num_actions) instead of (num_samples, 1)

  • initial_log_std (float, optional) – Initial value for the log standard deviation (default: 0)

  • fixed_log_std (bool, optional) – Whether the log standard deviation parameter should be fixed (default: False). Fixed parameters will be excluded from model parameters.

  • network (list of dict, optional) – Network definition (default: [])

  • output (list or str, optional) – Output expression (default: “”)

  • return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).

Returns:

Gaussian model instance or definition source

Return type:

Model