Learning rate schedulers
The implemented schedulers inherit from the PyTorch _LRScheduler
class. Visit how to adjust learning rate in the PyTorch documentation for more details
Implemented learning rate schedulers
Basic usage
The learning rate scheduler usage is defined in each agent’s configuration dictionary. The scheduler class is set under the "learning_rate_scheduler"
key and its arguments are set under the "learning_rate_scheduler_kwargs"
key as a keyword argument dictionary, without specifying the optimizer (first argument). The following examples show how to set the scheduler for an agent:
# import the scheduler class
from torch.optim.lr_scheduler import StepLR
cfg = DEFAULT_CONFIG.copy()
cfg["learning_rate_scheduler"] = StepLR
cfg["learning_rate_scheduler_kwargs"] = {"step_size": 1, "gamma": 0.9}
# import the scheduler class
from skrl.resources.schedulers.torch import KLAdaptiveRL
cfg = DEFAULT_CONFIG.copy()
cfg["learning_rate_scheduler"] = KLAdaptiveRL
cfg["learning_rate_scheduler_kwargs"] = {"kl_threshold": 0.01}
KL Adaptive
Algorithm implementation
The learning rate (\(\eta\)) at each step is modified as follows:
API
- class skrl.resources.schedulers.torch.kl_adaptive.KLAdaptiveRL(optimizer: torch.optim.optimizer.Optimizer, kl_threshold: float = 0.008, min_lr: float = 1e-06, max_lr: float = 0.01, kl_factor: float = 2, lr_factor: float = 1.5, last_epoch: int = - 1, verbose: bool = False)
Bases:
torch.optim.lr_scheduler._LRScheduler
- __init__(optimizer: torch.optim.optimizer.Optimizer, kl_threshold: float = 0.008, min_lr: float = 1e-06, max_lr: float = 0.01, kl_factor: float = 2, lr_factor: float = 1.5, last_epoch: int = - 1, verbose: bool = False) None
Adaptive KL scheduler
Adjusts the learning rate according to the KL divergence. The implementation is adapted from the rl_games library (https://github.com/Denys88/rl_games/blob/master/rl_games/common/schedulers.py)
Note
This scheduler is only available for PPO at the moment. Applying it to other agents will not change the learning rate
Example:
>>> scheduler = KLAdaptiveRL(optimizer, kl_threshold=0.01) >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> kl_divergence = ... >>> scheduler.step(kl_divergence)
- Parameters
optimizer (torch.optim.Optimizer) – Wrapped optimizer
kl_threshold (float, optional) – Threshold for KL divergence (default:
0.008
)min_lr (float, optional) – Lower bound for learning rate (default:
1e-6
)max_lr (float, optional) – Upper bound for learning rate (default:
1e-2
)kl_factor (float, optional) – The number used to modify the KL divergence threshold (default:
2
)lr_factor (float, optional) – The number used to modify the learning rate (default:
1.5
)last_epoch (int, optional) – The index of last epoch (default:
-1
)verbose (bool, optional) – Verbose mode (default:
False
)
- get_last_lr()
Return last computed learning rate by current scheduler.
- load_state_dict(state_dict)
Loads the schedulers state.
- Args:
- state_dict (dict): scheduler state. Should be an object returned
from a call to
state_dict()
.
- print_lr(is_verbose, group, lr, epoch=None)
Display the current learning rate.
- state_dict()
Returns the state of the scheduler as a
dict
.It contains an entry for every variable in self.__dict__ which is not the optimizer.
- step(kl: Optional[Union[torch.Tensor, float]] = None, epoch: Optional[int] = None) None
Step scheduler
Example:
>>> kl = torch.distributions.kl_divergence(p, q) >>> kl tensor([0.0332, 0.0500, 0.0383, ..., 0.0076, 0.0240, 0.0164]) >>> scheduler.step(kl.mean()) >>> kl = 0.0046 >>> scheduler.step(kl)
- Parameters
kl (torch.Tensor, float, None, optional) – KL divergence (default: None) If None, no adjustment is made. If tensor, the number of elements must be 1
epoch (int, optional) – Epoch (default: None)