Saving, loading and logging

Tracking metrics (TensorBoard)

TensorBoard is used for tracking and visualizing metrics and scalars (coefficients, losses, etc.). The tracking and writing of metrics and scalars is the responsibility of the agents (can be customized independently for each agent using its configuration dictionary).


Each agent offers the following parameters under the "experiment" key:


    "experiment": {
        "directory": "",            # experiment's parent directory
        "experiment_name": "",      # experiment name
        "write_interval": 250,      # TensorBoard writing interval (timesteps)

        "checkpoint_interval": 1000,        # interval for checkpoints (timesteps)
        "store_separately": False,          # whether to store checkpoints separately

        "wandb": False,             # whether to use Weights & Biases
        "wandb_kwargs": {}          # wandb kwargs (see
  • directory: directory path where the data generated by the experiments (a subdirectory) are stored. If no value is set, the runs folder (inside the current working directory) will be used (and created if it does not exist).

  • experiment_name: name of the experiment (subdirectory). If no value is set, it will be the current date and time and the agent’s name (e.g. 22-01-09_22-48-49-816281_DDPG).

  • write_interval: interval for writing metrics and values to TensorBoard (default is 250 timesteps). A value equal to or less than 0 disables tracking and writing to TensorBoard.

Tracked metrics/scales visualization

To visualize the tracked metrics/scales, during or after the training, TensorBoard can be launched using the following command in a terminal:

tensorboard --logdir=PATH_TO_RUNS_DIRECTORY
TensorBoard panel

The following table shows the metrics/scales tracked by each agent ([+] all the time, [-] only when such a function is enabled in the agent’s configuration):


Metric / Scalar














Entropy coefficient


Return threshold


Mean disc. returns



Total timesteps














Exploration noise



Exploration epsilon




Learning rate



Policy learning rate

Critic learning rate

Return threshold


Critic loss




Entropy loss

Discriminator loss


Policy loss









Q-network loss



Value loss






Standard deviation














Instantaneous reward













Total reward




















Tracking custom metrics/scales

  • Tracking custom data attached to the agent’s control and timing logic (recommended)

    Although the TensorBoard’s writing control and timing logic is controlled by the base class Agent, it is possible to track custom data. The track_data method can be used (see Agent class for more details), passing as arguments the data identification (tag) and the scalar value to be recorded.

    For example, to track the current CPU usage, the following code can be used:

    # assuming agent is an instance of an Agent subclass
    agent.track_data("Resource / CPU usage", psutil.cpu_percent())
  • Tracking custom data directly to Tensorboard

    It is also feasible to access directly to the SummaryWriter instance through the writer property if it is desired to write directly to Tensorboard, avoiding the base class’s control and timing logic.

    For example, to write directly to TensorBoard:

    # assuming agent is an instance of an Agent subclass
    agent.writer.add_scalar("Resource / CPU usage", psutil.cpu_percent(), global_step=1000)

Tracking metrics (Weights and Biases)

Weights & Biases is also supported for tracking and visualizing metrics and scalars. Its configuration is responsibility of the agents (can be customized independently for each agent using its configuration dictionary).

Follow the steps described in Weights & Biases documentation (Set up wandb) to login to the wandb library on the current machine.


Each agent offers the following parameters under the "experiment" key. Visit the Weights & Biases documentation for more details about the configuration parameters.


  • wandb: whether to enable support for Weights & Biases.

  • wandb_kwargs: keyword argument dictionary used to parameterize the wandb.init function. If no values are provided for the following parameters, the following values will be set for them:

    • "name": will be set to the name of the experiment directory.

    • "sync_tensorboard": will be set to True.

    • "config": will be updated with the configuration dictionaries of both the agent (and its models) and the trainer. The update will be done even if a value has been set for the parameter.


Saving checkpoints

The checkpoints are saved in the checkpoints subdirectory of the experiment’s directory (its path can be customized using the options described in the previous subsection). The checkpoint name is the key referring to the agent (or models, optimizers and preprocessors) and the current timestep (e.g. runs/22-01-09_22-48-49-816281_DDPG/checkpoints/

The checkpoint management, as in the previous case, is the responsibility of the agents (can be customized independently for each agent using its configuration dictionary).


  • checkpoint_interval: interval for checkpoints (default is 1000 timesteps). A value equal to or less than 0 disables the checkpoint creation.

  • store_separately: if set to True, all the modules that an agent contains (models, optimizers, preprocessors, etc.) will be saved each one in a separate file. By default (False) the modules are grouped in a dictionary and stored in the same file.

Checkpointing the best models

The best models, attending the mean total reward, will be saved in the checkpoints subdirectory of the experiment’s directory. The checkpoint name is the word best and the key referring to the model (e.g. runs/22-01-09_22-48-49-816281_DDPG/checkpoints/

The best models are updated internally on each TensorBoard writing interval "write_interval" and they are saved on each checkpoint interval "checkpoint_interval". The "store_separately" key specifies whether the best modules are grouped and stored together or separately.

Loading checkpoints

Checkpoints can be loaded for each of the instantiated agents (or models) independently via the .load(...) method (Agent.load or Model.load). It accepts the path (relative or absolute) of the checkpoint to load as the only argument. The checkpoint will be dynamically mapped to the device specified as argument in the class constructor (internally the torch load’s map_location method is used during loading).


The agents or models instances must have the same architecture/structure as the one used to save the checkpoint. The current implementation load the model’s state_dict directly.

The following code snippets show how to load the checkpoints through the instantiated agent (recommended) or models. See the Examples section for showcases about how to load control points and use them to continue the training or evaluate experiments.

from skrl.agents.torch.ppo import PPO

# Instantiate the agent
agent = PPO(models=models,  # models dict
            memory=memory,  # memory instance, or None if not required
            cfg=agent_cfg,  # configuration dict (preprocessors, learning rate schedulers, etc.)

# Load the checkpoint

In addition, it is possible to load, through the library utilities, trained agent checkpoints from the Hugging Face Hub ( See the Hugging Face integration for more information.

from skrl.agents.torch.ppo import PPO
from skrl.utils.huggingface import download_model_from_huggingface

# Instantiate the agent
agent = PPO(models=models,  # models dict
            memory=memory,  # memory instance, or None if not required
            cfg=agent_cfg,  # configuration dict (preprocessors, learning rate schedulers, etc.)

# Load the checkpoint from Hugging Face Hub
path = download_model_from_huggingface("skrl/OmniIsaacGymEnvs-Cartpole-PPO")

Migrating external checkpoints

It is possible to load checkpoints generated with external reinforcement learning libraries into skrl agents (or models) via the .migrate(...) method (Agent.migrate or Model.migrate).


In some cases it will be necessary to specify a parameter mapping, especially in ambiguous models (where 2 or more parameters, for source or current model, have equal shape). Refer to the respective method documentation for more details in these cases.

The following code snippets show how to migrate checkpoints from other libraries to the agents or models implemented in skrl:

from skrl.agents.torch.ppo import PPO

# Instantiate the agent
agent = PPO(models=models,  # models dict
            memory=memory,  # memory instance, or None if not required
            cfg=agent_cfg,  # configuration dict (preprocessors, learning rate schedulers, etc.)

# Migrate a rl_games checkpoint

Memory export/import

Exporting memories

Memories can be automatically exported to files at each filling cycle (before data overwriting is performed). Its activation, the output files’ format and their path can be modified through the constructor parameters when an instance is created.

from skrl.memories.torch import RandomMemory

# Instantiate a memory and enable its export
memory = RandomMemory(memory_size=16,
  • export: enable or disable the memory export (default is disabled).

  • export_format: the format of the exported memory (default is "pt"). Supported formats are PyTorch ("pt"), NumPy ("np") and Comma-separated values ("csv").

  • export_directory: the directory where the memory will be exported (default is "memory").

Importing memories

