Learning rate schedulers

The implemented schedulers inherit from the PyTorch _LRScheduler class. Visit how to adjust learning rate in the PyTorch documentation for more details

Implemented learning rate schedulers

KL Adaptive

Basic usage

The learning rate scheduler usage is defined in each agent’s configuration dictionary. The scheduler class is set under the "learning_rate_scheduler" key and its arguments are set under the "learning_rate_scheduler_kwargs" key as a keyword argument dictionary, without specifying the optimizer (first argument). The following examples show how to set the scheduler for an agent:

# import the scheduler class
from torch.optim.lr_scheduler import StepLR

cfg = DEFAULT_CONFIG.copy()
cfg["learning_rate_scheduler"] = StepLR
cfg["learning_rate_scheduler_kwargs"] = {"step_size": 1, "gamma": 0.9}

# import the scheduler class
from skrl.resources.schedulers.torch import KLAdaptiveRL

cfg = DEFAULT_CONFIG.copy()
cfg["learning_rate_scheduler"] = KLAdaptiveRL
cfg["learning_rate_scheduler_kwargs"] = {"kl_threshold": 0.01}

KL Adaptive

Algorithm implementation

The learning rate (\(\eta\)) at each step is modified as follows:

IF \(\; KL >\) kl_factor kl_threshold THEN

\(\eta_{t + 1} = \max(\) lr_factor \(^{-1} \; \eta_t,\) min_lr \()\)

IF \(\; KL <\) kl_factor \(^{-1}\) kl_threshold THEN

\(\eta_{t + 1} = \min(\) lr_factor \(\eta_t,\) max_lr \()\)

API

class skrl.resources.schedulers.torch.kl_adaptive.KLAdaptiveRL(optimizer: torch.optim.optimizer.Optimizer, kl_threshold: float = 0.008, min_lr: float = 1e-06, max_lr: float = 0.01, kl_factor: float = 2, lr_factor: float = 1.5, last_epoch: int = - 1, verbose: bool = False)

Bases: torch.optim.lr_scheduler._LRScheduler

__init__(optimizer: torch.optim.optimizer.Optimizer, kl_threshold: float = 0.008, min_lr: float = 1e-06, max_lr: float = 0.01, kl_factor: float = 2, lr_factor: float = 1.5, last_epoch: int = - 1, verbose: bool = False) → None

Adaptive KL scheduler

Adjusts the learning rate according to the KL divergence. The implementation is adapted from the rl_games library (https://github.com/Denys88/rl_games/blob/master/rl_games/common/schedulers.py)

Note

This scheduler is only available for PPO at the moment. Applying it to other agents will not change the learning rate

Example:

>>> scheduler = KLAdaptiveRL(optimizer, kl_threshold=0.01)
>>> for epoch in range(100):
>>>     train(...)
>>>     validate(...)
>>>     kl_divergence = ...
>>>     scheduler.step(kl_divergence)

Parameters

optimizer (torch.optim.Optimizer) – Wrapped optimizer
kl_threshold (float, optional) – Threshold for KL divergence (default: 0.008)
min_lr (float, optional) – Lower bound for learning rate (default: 1e-6)
max_lr (float, optional) – Upper bound for learning rate (default: 1e-2)
kl_factor (float, optional) – The number used to modify the KL divergence threshold (default: 2)
lr_factor (float, optional) – The number used to modify the learning rate (default: 1.5)
last_epoch (int, optional) – The index of last epoch (default: -1)
verbose (bool, optional) – Verbose mode (default: False)

get_last_lr(): Return last computed learning rate by current scheduler.

load_state_dict(state_dict)

Loads the schedulers state.

Args:

state_dict (dict): scheduler state. Should be an object returned: from a call to state_dict().

print_lr(is_verbose, group, lr, epoch=None): Display the current learning rate.

state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

step(kl: Optional[Union[torch.Tensor, float]] = None, epoch: Optional[int] = None) → None

Step scheduler

Example:

>>> kl = torch.distributions.kl_divergence(p, q)
>>> kl
tensor([0.0332, 0.0500, 0.0383,  ..., 0.0076, 0.0240, 0.0164])
>>> scheduler.step(kl.mean())

>>> kl = 0.0046
>>> scheduler.step(kl)

Parameters

kl (torch.Tensor, float, None, optional) – KL divergence (default: None) If None, no adjustment is made. If tensor, the number of elements must be 1
epoch (int, optional) – Epoch (default: None)