KL Adaptive¶
Adjust the learning rate according to the value of the Kullback-Leibler (KL) divergence.
Algorithm¶
Algorithm implementation¶
The learning rate (\(\eta\)) at each step is modified as follows:
Usage¶
The learning rate scheduler usage is defined in each agent’s configuration dictionary. The scheduler class is set under the "learning_rate_scheduler"
key and its arguments are set under the "learning_rate_scheduler_kwargs"
key as a keyword argument dictionary, without specifying the optimizer (first argument). The following examples show how to set the scheduler for an agent:
# import the scheduler class
from skrl.resources.schedulers.torch import KLAdaptiveLR
cfg = DEFAULT_CONFIG.copy()
cfg["learning_rate_scheduler"] = KLAdaptiveLR
cfg["learning_rate_scheduler_kwargs"] = {"kl_threshold": 0.01}
# import the scheduler class
from skrl.resources.schedulers.jax import KLAdaptiveLR # or kl_adaptive (Optax style)
cfg = DEFAULT_CONFIG.copy()
cfg["learning_rate_scheduler"] = KLAdaptiveLR
cfg["learning_rate_scheduler_kwargs"] = {"kl_threshold": 0.01}
API (PyTorch)¶
- class skrl.resources.schedulers.torch.kl_adaptive.KLAdaptiveLR(*args: Any, **kwargs: Any)¶
Bases:
_LRScheduler
- __init__(optimizer: torch.optim.Optimizer, kl_threshold: float = 0.008, min_lr: float = 1e-06, max_lr: float = 0.01, kl_factor: float = 2, lr_factor: float = 1.5, last_epoch: int = -1, verbose: bool = False) None ¶
Adaptive KL scheduler
Adjusts the learning rate according to the KL divergence. The implementation is adapted from the rl_games library (https://github.com/Denys88/rl_games/blob/master/rl_games/common/schedulers.py)
Note
This scheduler is only available for PPO at the moment. Applying it to other agents will not change the learning rate
Example:
>>> scheduler = KLAdaptiveLR(optimizer, kl_threshold=0.01) >>> for epoch in range(100): >>> # ... >>> kl_divergence = ... >>> scheduler.step(kl_divergence)
- Parameters:
optimizer (torch.optim.Optimizer) – Wrapped optimizer
kl_threshold (float, optional) – Threshold for KL divergence (default:
0.008
)min_lr (float, optional) – Lower bound for learning rate (default:
1e-6
)max_lr (float, optional) – Upper bound for learning rate (default:
1e-2
)kl_factor (float, optional) – The number used to modify the KL divergence threshold (default:
2
)lr_factor (float, optional) – The number used to modify the learning rate (default:
1.5
)last_epoch (int, optional) – The index of last epoch (default:
-1
)verbose (bool, optional) – Verbose mode (default:
False
)
- step(kl: torch.Tensor | float | None = None, epoch: int | None = None) None ¶
Step scheduler
Example:
>>> kl = torch.distributions.kl_divergence(p, q) >>> kl tensor([0.0332, 0.0500, 0.0383, ..., 0.0076, 0.0240, 0.0164]) >>> scheduler.step(kl.mean()) >>> kl = 0.0046 >>> scheduler.step(kl)
- Parameters:
kl (torch.Tensor, float or None, optional) – KL divergence (default:
None
) If None, no adjustment is made. If tensor, the number of elements must be 1epoch (int, optional) – Epoch (default:
None
)
API (JAX)¶
- class skrl.resources.schedulers.jax.kl_adaptive.KLAdaptiveLR(init_value: float, kl_threshold: float = 0.008, min_lr: float = 1e-06, max_lr: float = 0.01, kl_factor: float = 2, lr_factor: float = 1.5)¶
Bases:
object
- __init__(init_value: float, kl_threshold: float = 0.008, min_lr: float = 1e-06, max_lr: float = 0.01, kl_factor: float = 2, lr_factor: float = 1.5) None ¶
Adaptive KL scheduler
Adjusts the learning rate according to the KL divergence. The implementation is adapted from the rl_games library (https://github.com/Denys88/rl_games/blob/master/rl_games/common/schedulers.py)
Note
This scheduler is only available for PPO at the moment. Applying it to other agents will not change the learning rate
Example:
>>> scheduler = KLAdaptiveLR(init_value=1e-3, kl_threshold=0.01) >>> for epoch in range(100): >>> # ... >>> kl_divergence = ... >>> scheduler.step(kl_divergence) >>> scheduler.lr # get the updated learning rate
- Parameters:
init_value (float) – Initial learning rate
kl_threshold (float, optional) – Threshold for KL divergence (default:
0.008
)min_lr (float, optional) – Lower bound for learning rate (default:
1e-6
)max_lr (float, optional) – Upper bound for learning rate (default:
1e-2
)kl_factor (float, optional) – The number used to modify the KL divergence threshold (default:
2
)lr_factor (float, optional) – The number used to modify the learning rate (default:
1.5
)
- step(kl: ndarray | float | None = None) None ¶
Step scheduler
Example:
>>> kl = [0.0332, 0.0500, 0.0383, 0.0456, 0.0076, 0.0240, 0.0164] >>> kl [0.0332, 0.05, 0.0383, 0.0456, 0.0076, 0.024, 0.0164] >>> scheduler.step(np.mean(kl)) >>> kl = 0.0046 >>> scheduler.step(kl)
- Parameters:
kl (np.ndarray, float or None, optional) – KL divergence (default:
None
) If None, no adjustment is made. If array, the number of elements must be 1