Model instantiators¶
Utilities for quickly creating model instances.
Models |
|
|
---|---|---|
Tabular model (discrete domain) |
\(\square\) |
\(\square\) |
Categorical model (discrete domain) |
\(\blacksquare\) |
\(\blacksquare\) |
Multi-Categorical model (discrete domain) |
\(\blacksquare\) |
\(\blacksquare\) |
Gaussian model (continuous domain) |
\(\blacksquare\) |
\(\blacksquare\) |
Multivariate Gaussian model (continuous domain) |
\(\blacksquare\) |
\(\square\) |
Deterministic model (continuous domain) |
\(\blacksquare\) |
\(\blacksquare\) |
\(\blacksquare\) |
\(\square\) |
Network definitions¶
The network is composed of one or more containers. For each container its input, hidden layers and activation functions are specified.
Implementation details:
The network compute/forward is done by calling the containers in the order in which they are defined
Containers use
torch.nn.Sequential
in PyTorch, andflax.linen.Sequential
in JAXIf a single activation function is specified (mapping or sequence), it will be applied after each layer (except
flatten
layers) in the container
network:
- name: <NAME> # container name
input: <INPUT> # container input (certain operations are supported)
layers: # list of supported layers
- <LAYER 1>
- ...
- <LAYER N>
activations: # list of supported activation functions
- <ACTIVATION 1>
- ...
- <ACTIVATION N>
network=[
{
"name": <NAME>, # container name
"input": <INPUT>, # container input (certain operations are supported)
"layers": [ # list of supported layers
<LAYER 1>,
...,
<LAYER N>,
],
"activations": [ # list of supported activation functions
<ACTIVATION 1>,
...,
<ACTIVATION N>,
],
},
]
Inputs¶
Inputs can be specified using tokens or previously defined container outputs (by container name). Certain operations could be specified on them, including indexing and slicing
Hint
Operations can be mixed to create complex input statements
Available tokens:
OBSERVATIONS
: Token indicating the input states (inputs["states"]
) forwarded to the modelACTIONS
: Token indicating the input actions (inputs["taken_actions"]
) forwarded to the modelOBSERVATIONS_ACTIONS
: Token indicating the concatenation of the forwarded input states and actionsOBSERVATION_SPACE
: Token indicating theobservation_space
of the modelACTION_SPACE
: Token indicating theaction_space
of the modelSTATES
: Alias forOBSERVATIONS
(this is to change in future versions to distinguish between observation and state spaces)STATES_ACTIONS
: Alias forOBSERVATIONS_ACTIONS
(this is to change in future versions to distinguish between observation and state spaces)
Supported operations:
Operations |
Example |
---|---|
Tensor/array indexing and slicing.
|
|
Dictionary indexing by key.
|
|
Arithmetic ( |
|
Concatenation |
|
Permute dimensions |
|
One-hot encoding |
|
Output¶
The output can be specified using tokens or defined container outputs (by container name). Certain operations could be specified on it
Note
If a token is used, a linear layer will be created with the last container in the list (as the number of input features) and the value represented by the token (as the number of output features)
Hint
Operations can be mixed to create complex output statement
Available tokens:
ACTIONS
: Token indicating that the output shape is the number of elements in the action spaceONE
: Token indicating that the output shape is 1
Supported operations:
Operations |
Example |
---|---|
Activation function |
|
Arithmetic ( |
|
Concatenation |
|
Activation functions¶
The following table lists the supported activation functions:
Activations |
|
|
---|---|---|
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
Layers¶
The following table lists the supported layers and transformations:
Layers |
|
|
---|---|---|
|
||
|
||
|
linear¶
Apply a linear transformation (torch.nn.Linear
in PyTorch, flax.linen.Dense
in JAX)
Note
The tokens STATES
(number of elements in the observation/state space), ACTIONS
(number of elements in the action space), STATES_ACTIONS
(the sum of the number of elements of the observation/state space and of the action space) and ONE
(1) can be used as the layer’s number of input/output features
Note
If the PyTorch’s in_features
parameter is not specified it will be inferred by using the torch.nn.LazyLinear
module
|
|
Type |
Required |
Description |
|
---|---|---|---|---|---|
|
- |
|
\(\square\) |
Number of input features |
|
0 |
|
|
|
\(\blacksquare\) |
Number of output features |
1 |
|
|
|
\(\square\) |
Whether to add a bias |
layers:
- 32
layers:
- linear: 32
layers:
- linear: [32]
Hint
The parameter names can be interchanged/mixed between PyTorch and JAX
layers:
- linear: {out_features: 32}
"layers": [
32,
]
"layers": [
{"linear": 32},
]
"layers": [
{"linear": [32]},
]
Hint
The parameter names can be interchanged/mixed between PyTorch and JAX
"layers": [
{"linear": {"out_features": 32}},
]
conv2d¶
Apply a 2D convolution (torch.nn.Conv2d
in PyTorch, flax.linen.Conv
in JAX)
Warning
PyTorch
torch.nn.Conv2d
expects the input to be in the form NCHW (N: batch, C: channels, H: height, W: width). A permutation operation may be necessary to modify the dimensions of a batch of images which are typically NHWC.JAX
flax.linen.Conv
expects the input to be in the form NHWC (the typical dimensions of a batch of images).
Note
If the PyTorch’s in_channels
parameter is not specified it will be inferred by using the torch.nn.LazyConv2d
module
|
|
Type |
Required |
Description |
|
---|---|---|---|---|---|
|
- |
|
\(\square\) |
Number of input channels |
|
0 |
|
|
|
\(\blacksquare\) |
Number of output channels (filters) |
1 |
|
|
|
\(\blacksquare\) |
Convolutional kernel size |
2 |
|
|
|
\(\square\) |
Inter-window strides |
3 |
|
|
|
\(\square\) |
Padding added to all dimensions |
4 |
|
|
|
\(\square\) |
Whether to add a bias |
layers:
- conv2d: [32, 8, [4, 4]]
Hint
The parameter names can be interchanged/mixed between PyTorch and JAX
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: [4, 4]}
"layers": [
{"conv2d": [32, 8, [4, 4]]},
]
Hint
The parameter names can be interchanged/mixed between PyTorch and JAX
"layers": [
{"conv2d": {"out_channels": 32, "kernel_size": 8, "stride": [4, 4]}},
]
flatten¶
Flatten a contiguous range of dimensions (torch.nn.Flatten
in PyTorch, jax.numpy.reshape
operation in JAX)
|
|
Type |
Required |
Description |
|
---|---|---|---|---|---|
0 |
|
- |
|
\(\square\) |
First dimension to flatten |
1 |
|
- |
|
\(\square\) |
Last dimension to flatten |
layers:
- flatten
layers:
- flatten: [1, -1]
Hint
The parameter names can be interchanged/mixed between PyTorch and JAX
layers:
- flatten: {start_dim: 1, end_dim: -1}
"layers": [
"flatten",
]
"layers": [
{"flatten": [1, -1]},
]
Hint
The parameter names can be interchanged/mixed between PyTorch and JAX
"layers": [
{"flatten": {"start_dim": 1, "end_dim": -1}},
]
API (PyTorch)¶
- skrl.utils.model_instantiators.torch.categorical_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, unnormalized_log_prob: bool = True, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str ¶
Instantiate a categorical model
- Parameters:
observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space
action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space
device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default:
None
). If None, the device will be either"cuda"
if available or"cpu"
unnormalized_log_prob (bool, optional) – Flag to indicate how to be interpreted the model’s output (default: True). If True, the model’s output is interpreted as unnormalized log probabilities (it can be any real number), otherwise as normalized probabilities (the output must be non-negative, finite and have a non-zero sum)
network (list of dict, optional) – Network definition (default: [])
output (list or str, optional) – Output expression (default: “”)
return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).
- Returns:
Categorical model instance or definition source
- Return type:
- skrl.utils.model_instantiators.torch.multicategorical_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, unnormalized_log_prob: bool = True, reduction: str = 'sum', network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str ¶
Instantiate a multi-categorical model
- Parameters:
observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space
action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space
device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default:
None
). If None, the device will be either"cuda"
if available or"cpu"
unnormalized_log_prob (bool, optional) – Flag to indicate how to be interpreted the model’s output (default: True). If True, the model’s output is interpreted as unnormalized log probabilities (it can be any real number), otherwise as normalized probabilities (the output must be non-negative, finite and have a non-zero sum)
reduction (str, optional) – Reduction method for returning the log probability density function: (default:
"sum"
). Supported values are"mean"
,"sum"
,"prod"
and"none"
. If “none"
, the log probability density function is returned as a tensor of shape(num_samples, num_actions)
instead of(num_samples, 1)
network (list of dict, optional) – Network definition (default: [])
output (list or str, optional) – Output expression (default: “”)
return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).
- Returns:
Multi-Categorical model instance or definition source
- Return type:
- skrl.utils.model_instantiators.torch.deterministic_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, clip_actions: bool = False, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str ¶
Instantiate a deterministic model
- Parameters:
observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space
action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space
device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default:
None
). If None, the device will be either"cuda"
if available or"cpu"
clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped (default: False)
network (list of dict, optional) – Network definition (default: [])
output (list or str, optional) – Output expression (default: “”)
return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).
- Returns:
Deterministic model instance or definition source
- Return type:
- skrl.utils.model_instantiators.torch.gaussian_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, clip_actions: bool = False, clip_log_std: bool = True, min_log_std: float = -20, max_log_std: float = 2, reduction: str = 'sum', initial_log_std: float = 0, fixed_log_std: bool = False, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str ¶
Instantiate a Gaussian model
- Parameters:
observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space
action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space
device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default:
None
). If None, the device will be either"cuda"
if available or"cpu"
clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped (default: False)
clip_log_std (bool, optional) – Flag to indicate whether the log standard deviations should be clipped (default: True)
min_log_std (float, optional) – Minimum value of the log standard deviation (default: -20)
max_log_std (float, optional) – Maximum value of the log standard deviation (default: 2)
reduction (str, optional) – Reduction method for returning the log probability density function: (default:
"sum"
). Supported values are"mean"
,"sum"
,"prod"
and"none"
. If “none"
, the log probability density function is returned as a tensor of shape(num_samples, num_actions)
instead of(num_samples, 1)
initial_log_std (float, optional) – Initial value for the log standard deviation (default: 0)
fixed_log_std (bool, optional) – Whether the log standard deviation parameter should be fixed (default: False). Fixed parameters have the gradient computation deactivated
network (list of dict, optional) – Network definition (default: [])
output (list or str, optional) – Output expression (default: “”)
return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).
- Returns:
Gaussian model instance or definition source
- Return type:
- skrl.utils.model_instantiators.torch.multivariate_gaussian_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | torch.device | None = None, clip_actions: bool = False, clip_log_std: bool = True, min_log_std: float = -20, max_log_std: float = 2, initial_log_std: float = 0, fixed_log_std: bool = False, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str ¶
Instantiate a multivariate Gaussian model
- Parameters:
observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space
action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space
device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default:
None
). If None, the device will be either"cuda"
if available or"cpu"
clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped (default: False)
clip_log_std (bool, optional) – Flag to indicate whether the log standard deviations should be clipped (default: True)
min_log_std (float, optional) – Minimum value of the log standard deviation (default: -20)
max_log_std (float, optional) – Maximum value of the log standard deviation (default: 2)
initial_log_std (float, optional) – Initial value for the log standard deviation (default: 0)
fixed_log_std (bool, optional) – Whether the log standard deviation parameter should be fixed (default: False). Fixed parameters have the gradient computation deactivated
network (list of dict, optional) – Network definition (default: [])
output (list or str, optional) – Output expression (default: “”)
return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).
- Returns:
Multivariate Gaussian model instance or definition source
- Return type:
Instantiate a shared model
- Parameters:
observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space
action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space
device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default:
None
). If None, the device will be either"cuda"
if available or"cpu"
structure (sequence of strings, optional) – Shared model structure (default: Gaussian-Deterministic).
roles (sequence of strings, optional) – Organized list of model roles (default:
[]
)parameters (sequence of dict, optional) – Organized list of model instantiator parameters (default:
[]
)single_forward_pass (bool) – Whether to perform a single forward-pass for the shared layers/network (default:
True
)return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).
- Returns:
Shared model instance or definition source
- Return type:
API (JAX)¶
- skrl.utils.model_instantiators.jax.categorical_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | jax.Device | None = None, unnormalized_log_prob: bool = True, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str ¶
Instantiate a categorical model
- Parameters:
observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space
action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space
device (str or jax.Device, optional) – Device on which a tensor/array is or will be allocated (default:
None
). If None, the device will be either"cuda"
if available or"cpu"
unnormalized_log_prob (bool, optional) – Flag to indicate how to be interpreted the model’s output (default: True). If True, the model’s output is interpreted as unnormalized log probabilities (it can be any real number), otherwise as normalized probabilities (the output must be non-negative, finite and have a non-zero sum)
network (list of dict, optional) – Network definition (default: [])
output (list or str, optional) – Output expression (default: “”)
return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).
- Returns:
Categorical model instance or definition source
- Return type:
- skrl.utils.model_instantiators.jax.multicategorical_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | jax.Device | None = None, unnormalized_log_prob: bool = True, reduction: str = 'sum', network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str ¶
Instantiate a multi-categorical model
- Parameters:
observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space
action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space
device (str or jax.Device, optional) – Device on which a tensor/array is or will be allocated (default:
None
). If None, the device will be either"cuda"
if available or"cpu"
unnormalized_log_prob (bool, optional) – Flag to indicate how to be interpreted the model’s output (default: True). If True, the model’s output is interpreted as unnormalized log probabilities (it can be any real number), otherwise as normalized probabilities (the output must be non-negative, finite and have a non-zero sum)
reduction (str, optional) – Reduction method for returning the log probability density function: (default:
"sum"
). Supported values are"mean"
,"sum"
,"prod"
and"none"
. If “none"
, the log probability density function is returned as a tensor of shape(num_samples, num_actions)
instead of(num_samples, 1)
network (list of dict, optional) – Network definition (default: [])
output (list or str, optional) – Output expression (default: “”)
return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).
- Returns:
Multi-Categorical model instance or definition source
- Return type:
- skrl.utils.model_instantiators.jax.deterministic_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | jax.Device | None = None, clip_actions: bool = False, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str ¶
Instantiate a deterministic model
- Parameters:
observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space
action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space
device (str or jax.Device, optional) – Device on which a tensor/array is or will be allocated (default:
None
). If None, the device will be either"cuda"
if available or"cpu"
clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped (default: False)
network (list of dict, optional) – Network definition (default: [])
output (list or str, optional) – Output expression (default: “”)
return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).
- Returns:
Deterministic model instance or definition source
- Return type:
- skrl.utils.model_instantiators.jax.gaussian_model(observation_space: int | Tuple[int] | gymnasium.Space | None = None, action_space: int | Tuple[int] | gymnasium.Space | None = None, device: str | jax.Device | None = None, clip_actions: bool = False, clip_log_std: bool = True, min_log_std: float = -20, max_log_std: float = 2, reduction: str = 'sum', initial_log_std: float = 0, fixed_log_std: bool = False, network: Sequence[Mapping[str, Any]] = [], output: str | Sequence[str] = '', return_source: bool = False, *args, **kwargs) Model | str ¶
Instantiate a Gaussian model
- Parameters:
observation_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Observation/state space or shape (default: None). If it is not None, the num_observations property will contain the size of that space
action_space (int, tuple or list of integers, gymnasium.Space or None, optional) – Action space or shape (default: None). If it is not None, the num_actions property will contain the size of that space
device (str or jax.Device, optional) – Device on which a tensor/array is or will be allocated (default:
None
). If None, the device will be either"cuda"
if available or"cpu"
clip_actions (bool, optional) – Flag to indicate whether the actions should be clipped (default: False)
clip_log_std (bool, optional) – Flag to indicate whether the log standard deviations should be clipped (default: True)
min_log_std (float, optional) – Minimum value of the log standard deviation (default: -20)
max_log_std (float, optional) – Maximum value of the log standard deviation (default: 2)
reduction (str, optional) – Reduction method for returning the log probability density function: (default:
"sum"
). Supported values are"mean"
,"sum"
,"prod"
and"none"
. If “none"
, the log probability density function is returned as a tensor of shape(num_samples, num_actions)
instead of(num_samples, 1)
initial_log_std (float, optional) – Initial value for the log standard deviation (default: 0)
fixed_log_std (bool, optional) – Whether the log standard deviation parameter should be fixed (default: False). Fixed parameters will be excluded from model parameters.
network (list of dict, optional) – Network definition (default: [])
output (list or str, optional) – Output expression (default: “”)
return_source (bool, optional) – Whether to return the source string containing the model class used to instantiate the model rather than the model instance (default: False).
- Returns:
Gaussian model instance or definition source
- Return type: