SKRL - Reinforcement Learning library (2.1.0)¶

skrl is an open-source library for Reinforcement Learning written in Python (implemented in PyTorch, JAX and NVIDIA Warp) and designed with a focus on modularity, readability, simplicity and transparency of algorithm implementation. In addition to supporting OpenAI Gym , Farama Gymnasium and PettingZoo, ManiSkill, among other environment interfaces, it allows loading and configuring NVIDIA Isaac Lab and MuJoCo Playground environments, enabling agents’ simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run.

Main features:

PyTorch ( ), JAX ( ) and Warp ( ).
Clean code.
Modularity and reusability.
Documented library, code and implementations.
Support for fundamental (Box, Discrete and MultiDiscrete) and composite (Dict and Tuple) spaces.
Support for Gym/Gymnasium (single and vectorized), ManiSkill, MuJoCo Playground, NVIDIA Isaac Lab environments, among others.
Simultaneous learning by scopes in Gym/Gymnasium (vectorized), ManiSkill, MuJoCo Playground, and NVIDIA Isaac Lab environments.

Hint

skrl is under active continuous development. Make sure you always have the latest version.
Visit the develop branch or its documentation to access the latest updates to be released.

GitHub repository: https://github.com/Toni-SM/skrl
Questions or discussions: https://github.com/Toni-SM/skrl/discussions
Paper: skrl: Modular and Flexible Library for Reinforcement Learning.

@article{serrano2023skrl,
  author  = {Antonio Serrano-Muñoz and Dimitrios Chrysostomou and Simon Bøgh and Nestor Arana-Arexolaleiba},
  title   = {skrl: Modular and Flexible Library for Reinforcement Learning},
  journal = {Journal of Machine Learning Research},
  year    = {2023},
  volume  = {24},
  number  = {254},
  pages   = {1--9},
  url     = {http://jmlr.org/papers/v24/23-0112.html}
}

User guide¶

To start using the library, visit the following links:

Library components (overview)¶

Agents¶

Definition of reinforcement learning algorithms that compute an optimal policy. All agents inherit from one, and only one, base class (that defines a uniform interface and provides for common functionalities) but which is not tied to the implementation details of the algorithms.

Advantage Actor Critic (A2C)

Adversarial Motion Priors (AMP)

Cross-Entropy Method (CEM)

Deep Deterministic Policy Gradient (DDPG)

Double Deep Q-Network (DDQN)

Deep Q-Network (DQN)

Proximal Policy Optimization (PPO)

Q-learning (Q-learning)

Robust Policy Optimization (RPO)

Soft Actor-Critic (SAC)

State Action Reward State Action (SARSA)

Twin-Delayed DDPG (TD3)

Trust Region Policy Optimization (TRPO)

Multi-agents¶

Definition of reinforcement learning algorithms that compute an optimal policies. All agents (multi-agents) inherit from one, and only one, base class (that defines a uniform interface and provides for common functionalities) but which is not tied to the implementation details of the algorithms.

Independent Proximal Policy Optimization (IPPO)

Multi-Agent Proximal Policy Optimization (MAPPO)

Environments¶

Definition of the Isaac Lab and Playground environment loaders, and wrappers for Gym/Gymnasium, Isaac Lab, ManiSkill, PettingZoo, Playground environments, among others.

Single-agent environment wrapping for Gym/Gymnasium, Isaac Lab, ManiSkill and Playground environments, among others.

Multi-agent environment wrapping for Isaac Lab and PettingZoo environments, among others.

Loading Isaac Lab environments

Loading Playground environments

Memories¶

Generic memory definitions. Such memories are not bound to any agent and can be used for any role such as rollout buffer or experience replay memory, for example. All memories inherit from a base class that defines a uniform interface and keeps track (in allocated tensors) of transitions with the environment or other defined data.

Random memory

Models¶

Definition of helper mixins for the construction of tabular functions or function approximators using artificial neural networks. This library does not provide predefined policies but helper mixins to create discrete and continuous (stochastic or deterministic) policies in which the user only has to define the tables (tensors) or artificial neural networks. All models inherit from one base class that defines a uniform interface and provides for common functionalities. In addition, it is possible to create shared model by combining the implemented definitions.

Tabular model (discrete domain)

Categorical model (discrete domain)

Multi-Categorical model (discrete domain)

Gaussian model (continuous domain)

Multivariate Gaussian model (continuous domain)

Deterministic model (continuous domain)

Trainers¶

Definition of the procedures responsible for managing the agent’s training and interaction with the environment. All trainers inherit from a base class that defines a uniform interface and provides for common functionalities.

Sequential trainer

Parallel trainer

Step trainer

Resources¶

Definition of resources used by the agents during training and/or evaluation, such as exploration noises or learning rate schedulers.

Noises: Definition of the noises used by the agents during the exploration stage. All noises inherit from a base class that defines a uniform interface.

Gaussian noise

Ornstein-Uhlenbeck noise

Learning rate schedulers: Definition of learning rate schedulers.

KL Adaptive

Preprocessors: Definition of preprocessors.

Running standard scaler

Optimizers: Definition of optimizers.

Adam

Utils and configurations¶

Definition of utilities and configurations.

ML frameworks configuration

Random seed

Spaces

Model instantiators

Runner

TensorBoard SummaryWriter

Distributed runs

Memory and TensorBoard file post-processing

Hugging Face integration