# Running standard scaler#

Standardize input features by removing the mean and scaling to unit variance.

## Algorithm#

### Algorithm implementation#

Main notation/symbols:
- mean ($$\bar{x}$$), standard deviation ($$\sigma$$), variance ($$\sigma^2$$)
- running mean ($$\bar{x}_t$$), running variance ($$\sigma^2_t$$)

Standardization by centering and scaling

$$\text{clip}((x - \bar{x}_t) / (\sqrt{\sigma^2} \;+$$ epsilon $$), -c, c) \qquad$$ with $$c$$ as clip_threshold

Scale back the data to the original representation (inverse transform)

$$\sqrt{\sigma^2_t} \; \text{clip}(x, -c, c) + \bar{x}_t \qquad$$ with $$c$$ as clip_threshold

Update the running mean and variance (See parallel algorithm)

$$\delta \leftarrow x - \bar{x}_t$$
$$n_T \leftarrow n_t + n$$
$$M2 \leftarrow (\sigma^2_t n_t) + (\sigma^2 n) + \delta^2 \dfrac{n_t n}{n_T}$$
# update internal variables
$$\bar{x}_t \leftarrow \bar{x}_t + \delta \dfrac{n}{n_T}$$
$$\sigma^2_t \leftarrow \dfrac{M2}{n_T}$$
$$n_t \leftarrow n_T$$

## Usage#

The preprocessors usage is defined in each agent’s configuration dictionary.

The preprocessor class is set under the "<variable>_preprocessor" key and its arguments are set under the "<variable>_preprocessor_kwargs" key as a keyword argument dictionary. The following examples show how to set the preprocessors for an agent:

# import the preprocessor class
from skrl.resources.preprocessors.torch import RunningStandardScaler

cfg = DEFAULT_CONFIG.copy()
cfg["state_preprocessor"] = RunningStandardScaler
cfg["state_preprocessor_kwargs"] = {"size": env.observation_space, "device": device}
cfg["value_preprocessor"] = RunningStandardScaler
cfg["value_preprocessor_kwargs"] = {"size": 1, "device": device}


## API (PyTorch)#

class skrl.resources.preprocessors.torch.running_standard_scaler.RunningStandardScaler(*args: Any, **kwargs: Any)#
__init__(size: int | Tuple[int] | gym.Space | gymnasium.Space, epsilon: float = 1e-08, clip_threshold: float = 5.0, device: = None) None#

Standardize the input data by removing the mean and scaling by the standard deviation

The implementation is adapted from the rl_games library (https://github.com/Denys88/rl_games/blob/master/rl_games/algos_torch/running_mean_std.py)

Example:

>>> running_standard_scaler = RunningStandardScaler(size=2)
>>> data = torch.rand(3, 2)  # tensor of shape (N, 2)
>>> running_standard_scaler(data)
tensor([[0.1954, 0.3356],
[0.9719, 0.4163],
[0.8540, 0.1982]])

Parameters:
• size (int, tuple or list of integers, gym.Space, or gymnasium.Space) – Size of the input space

• epsilon (float) – Small number to avoid division by zero (default: 1e-8)

• clip_threshold (float) – Threshold to clip the data (default: 5.0)

• device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

_get_space_size(space: int | Tuple[int] | gym.Space | gymnasium.Space) int#

Get the size (number of elements) of a space

Parameters:

space (int, tuple or list of integers, gym.Space, or gymnasium.Space) – Space or shape from which to obtain the number of elements

Raises:

ValueError – If the space is not supported

Returns:

Size of the space data

Return type:

Space size (number of elements)

_parallel_variance(input_mean: torch.Tensor, input_var: torch.Tensor, input_count: int) None#

Update internal variables using the parallel algorithm for computing variance

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

Parameters:
• input_mean (torch.Tensor) – Mean of the input data

• input_var (torch.Tensor) – Variance of the input data

• input_count (int) – Batch size of the input data

forward(x: torch.Tensor, train: bool = False, inverse: bool = False, no_grad: bool = True) #

Forward pass of the standardizer

Example:

>>> x = torch.rand(3, 2, device="cuda:0")
>>> running_standard_scaler(x)
tensor([[0.6933, 0.1905],
[0.3806, 0.3162],
[0.1140, 0.0272]], device='cuda:0')

>>> running_standard_scaler(x, train=True)
tensor([[ 0.8681, -0.6731],
[ 0.0560, -0.3684],
[-0.6360, -1.0690]], device='cuda:0')

>>> running_standard_scaler(x, inverse=True)
tensor([[0.6260, 0.5468],
[0.5056, 0.5987],
[0.4029, 0.4795]], device='cuda:0')

Parameters:
• x (torch.Tensor) – Input tensor

• train (bool, optional) – Whether to train the standardizer (default: False)

• inverse (bool, optional) – Whether to inverse the standardizer to scale back the data (default: False)

• no_grad (bool, optional) – Whether to disable the gradient computation (default: True)

Returns:

Standardized tensor

Return type:

torch.Tensor

## API (JAX)#

class skrl.resources.preprocessors.jax.running_standard_scaler.RunningStandardScaler(size: int | Tuple[int] | gym.Space | gymnasium.Space, epsilon: float = 1e-08, clip_threshold: float = 5.0, device: = None)#
__init__(size: int | Tuple[int] | gym.Space | gymnasium.Space, epsilon: float = 1e-08, clip_threshold: float = 5.0, device: = None) None#

Standardize the input data by removing the mean and scaling by the standard deviation

The implementation is adapted from the rl_games library (https://github.com/Denys88/rl_games/blob/master/rl_games/algos_torch/running_mean_std.py)

Example:

>>> running_standard_scaler = RunningStandardScaler(size=2)
>>> data = jax.random.uniform(jax.random.PRNGKey(0), (3,2))  # tensor of shape (N, 2)
>>> running_standard_scaler(data)
Array([[0.57450044, 0.09968603],
[0.7419659 , 0.8941783 ],
[0.59656656, 0.45325184]], dtype=float32)

Parameters:
• size (int, tuple or list of integers, gym.Space, or gymnasium.Space) – Size of the input space

• epsilon (float) – Small number to avoid division by zero (default: 1e-8)

• clip_threshold (float) – Threshold to clip the data (default: 5.0)

• device (str or jax.Device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"

__call__(x: , train: bool = False, inverse: bool = False) #

Forward pass of the standardizer

Example:

>>> x = jax.random.uniform(jax.random.PRNGKey(0), (3,2))
>>> running_standard_scaler(x)
Array([[0.57450044, 0.09968603],
[0.7419659 , 0.8941783 ],
[0.59656656, 0.45325184]], dtype=float32)

>>> running_standard_scaler(x, train=True)
Array([[ 0.167439  , -0.4292293 ],
[ 0.45878986,  0.8719094 ],
[ 0.20582889,  0.14980486]], dtype=float32)

>>> running_standard_scaler(x, inverse=True)
Array([[0.80847514, 0.4226486 ],
[0.9047325 , 0.90777594],
[0.8211585 , 0.6385405 ]], dtype=float32)

Parameters:
• x (np.ndarray or jax.Array) – Input tensor

• train (bool, optional) – Whether to train the standardizer (default: False)

• inverse (bool, optional) – Whether to inverse the standardizer to scale back the data (default: False)

Returns:

Standardized tensor

Return type:

np.ndarray or jax.Array

_get_space_size(space: int | Tuple[int] | gym.Space | gymnasium.Space) int#

Get the size (number of elements) of a space

Parameters:

space (int, tuple or list of integers, gym.Space, or gymnasium.Space) – Space or shape from which to obtain the number of elements

Raises:

ValueError – If the space is not supported

Returns:

Size of the space data

Return type:

Space size (number of elements)

_parallel_variance(input_mean: , input_var: , input_count: int) None#

Update internal variables using the parallel algorithm for computing variance

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

Parameters:
• input_mean (np.ndarray or jax.Array) – Mean of the input data

• input_var (np.ndarray or jax.Array) – Variance of the input data

• input_count (int) – Batch size of the input data

property state_dict: Mapping[str, ndarray | jax.Array]#

Dictionary containing references to the whole state of the module