Learning rate schedulers

The implemented schedulers inherit from the PyTorch _LRScheduler class. Visit how to adjust learning rate in the PyTorch documentation for more details

Implemented learning rate schedulers

Basic usage

The learning rate scheduler usage is defined in each agent’s configuration dictionary. The scheduler class is set under the "learning_rate_scheduler" key and its arguments are set under the "learning_rate_scheduler_kwargs" key as a keyword argument dictionary, without specifying the optimizer (first argument). The following examples show how to set the scheduler for an agent:

# import the scheduler class
from torch.optim.lr_scheduler import StepLR

cfg = DEFAULT_CONFIG.copy()
cfg["learning_rate_scheduler"] = StepLR
cfg["learning_rate_scheduler_kwargs"] = {"step_size": 1, "gamma": 0.9}

KL Adaptive

Algorithm implementation

The learning rate (\(\eta\)) at each step is modified as follows:

IF \(\; KL >\) kl_factor kl_threshold THEN
\(\eta_{t + 1} = \max(\) lr_factor \(^{-1} \; \eta_t,\) min_lr \()\)
IF \(\; KL <\) kl_factor \(^{-1}\) kl_threshold THEN
\(\eta_{t + 1} = \min(\) lr_factor \(\eta_t,\) max_lr \()\)

API

class skrl.resources.schedulers.torch.kl_adaptive.KLAdaptiveRL(optimizer: torch.optim.optimizer.Optimizer, kl_threshold: float = 0.008, min_lr: float = 1e-06, max_lr: float = 0.01, kl_factor: float = 2, lr_factor: float = 1.5, last_epoch: int = - 1, verbose: bool = False)

Bases: torch.optim.lr_scheduler._LRScheduler

__init__(optimizer: torch.optim.optimizer.Optimizer, kl_threshold: float = 0.008, min_lr: float = 1e-06, max_lr: float = 0.01, kl_factor: float = 2, lr_factor: float = 1.5, last_epoch: int = - 1, verbose: bool = False) None

Adaptive KL scheduler

Adjusts the learning rate according to the KL divergence. The implementation is adapted from the rl_games library (https://github.com/Denys88/rl_games/blob/master/rl_games/common/schedulers.py)

Note

This scheduler is only available for PPO at the moment. Applying it to other agents will not change the learning rate

Example:

>>> scheduler = KLAdaptiveRL(optimizer, kl_threshold=0.01)
>>> for epoch in range(100):
>>>     train(...)
>>>     validate(...)
>>>     kl_divergence = ...
>>>     scheduler.step(kl_divergence)
Parameters
  • optimizer (torch.optim.Optimizer) – Wrapped optimizer

  • kl_threshold (float, optional) – Threshold for KL divergence (default: 0.008)

  • min_lr (float, optional) – Lower bound for learning rate (default: 1e-6)

  • max_lr (float, optional) – Upper bound for learning rate (default: 1e-2)

  • kl_factor (float, optional) – The number used to modify the KL divergence threshold (default: 2)

  • lr_factor (float, optional) – The number used to modify the learning rate (default: 1.5)

  • last_epoch (int, optional) – The index of last epoch (default: -1)

  • verbose (bool, optional) – Verbose mode (default: False)

get_last_lr()

Return last computed learning rate by current scheduler.

load_state_dict(state_dict)

Loads the schedulers state.

Args:
state_dict (dict): scheduler state. Should be an object returned

from a call to state_dict().

print_lr(is_verbose, group, lr, epoch=None)

Display the current learning rate.

state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

step(kl: Optional[Union[torch.Tensor, float]] = None, epoch: Optional[int] = None) None

Step scheduler

Example:

>>> kl = torch.distributions.kl_divergence(p, q)
>>> kl
tensor([0.0332, 0.0500, 0.0383,  ..., 0.0076, 0.0240, 0.0164])
>>> scheduler.step(kl.mean())

>>> kl = 0.0046
>>> scheduler.step(kl)
Parameters
  • kl (torch.Tensor, float, None, optional) – KL divergence (default: None) If None, no adjustment is made. If tensor, the number of elements must be 1

  • epoch (int, optional) – Epoch (default: None)