Runner¶

Utility that configures and instantiates skrl’s components to run training/evaluation workflows in a few lines of code.

Usage¶

Hint

The Runner classes encapsulates, and greatly simplifies, the definitions and instantiations needed to execute RL tasks. However, such simplification hides and makes difficult the modification and readability of the code (models, agents, etc.).

For more control and readability over the RL system setup refer to the Examples section’s training scripts (recommended!).

from skrl.utils.runner.torch import Runner
from skrl.envs.wrappers.torch import wrap_env

# load and wrap some environment
env = ...
env = wrap_env(env)

# load the experiment config and instantiate the runner
cfg = Runner.load_cfg_from_yaml("path/to/cfg.yaml")
runner = Runner(env, cfg)

# load a checkpoint to continue training or for evaluation (optional)
runner.agent.load("path/to/checkpoints/agent.pt")

# run the training
runner.run("train")  # or "eval" for evaluation

seed: 42


# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
models:
  separate: False
  policy:  # see gaussian_model parameters
    class: GaussianMixin
    clip_actions: False
    clip_log_std: True
    min_log_std: -20.0
    max_log_std: 2.0
    initial_log_std: 0.0
    network:
      - name: net
        input: OBSERVATIONS
        layers: [32, 32]
        activations: elu
    output: ACTIONS
  value:  # see deterministic_model parameters
    class: DeterministicMixin
    clip_actions: False
    network:
      - name: net
        input: OBSERVATIONS
        layers: [32, 32]
        activations: elu
    output: ONE


# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
  class: RandomMemory
  memory_size: -1  # automatically determined (same as agent:rollouts)


# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
  class: PPO
  rollouts: 32
  learning_epochs: 8
  mini_batches: 8
  discount_factor: 0.99
  gae_lambda: 0.95
  learning_rate: 5.0e-04
  learning_rate_scheduler: KLAdaptiveLR
  learning_rate_scheduler_kwargs:
    kl_threshold: 0.008
  observation_preprocessor: RunningStandardScaler
  observation_preprocessor_kwargs: null
  state_preprocessor: null
  state_preprocessor_kwargs: null
  value_preprocessor: RunningStandardScaler
  value_preprocessor_kwargs: null
  random_timesteps: 0
  learning_starts: 0
  grad_norm_clip: 1.0
  ratio_clip: 0.2
  value_clip: 0.2
  entropy_loss_scale: 0.0
  value_loss_scale: 2.0
  kl_threshold: 0.0
  time_limit_bootstrap: False
  rewards_shaper_scale: 0.1
  # logging and checkpoint
  experiment:
    directory: "cartpole_direct"
    experiment_name: ""
    write_interval: auto
    checkpoint_interval: auto


# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
  class: SequentialTrainer
  timesteps: 4800
  environment_info: log

from skrl.utils.runner.jax import Runner
from skrl.envs.wrappers.jax import wrap_env

# load and wrap some environment
env = ...
env = wrap_env(env)

# load the experiment config and instantiate the runner
cfg = Runner.load_cfg_from_yaml("path/to/cfg.yaml")
runner = Runner(env, cfg)

# load a checkpoint to continue training or for evaluation (optional)
runner.agent.load("path/to/checkpoints/agent.pickle")

# run the training
runner.run("train")  # or "eval" for evaluation

seed: 42


# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
models:
  separate: False
  policy:  # see gaussian_model parameters
    class: GaussianMixin
    clip_actions: False
    clip_log_std: True
    min_log_std: -20.0
    max_log_std: 2.0
    initial_log_std: 0.0
    network:
      - name: net
        input: OBSERVATIONS
        layers: [32, 32]
        activations: elu
    output: ACTIONS
  value:  # see deterministic_model parameters
    class: DeterministicMixin
    clip_actions: False
    network:
      - name: net
        input: OBSERVATIONS
        layers: [32, 32]
        activations: elu
    output: ONE


# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
  class: RandomMemory
  memory_size: -1  # automatically determined (same as agent:rollouts)


# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
  class: PPO
  rollouts: 32
  learning_epochs: 8
  mini_batches: 8
  discount_factor: 0.99
  gae_lambda: 0.95
  learning_rate: 5.0e-04
  learning_rate_scheduler: KLAdaptiveLR
  learning_rate_scheduler_kwargs:
    kl_threshold: 0.008
  observation_preprocessor: RunningStandardScaler
  observation_preprocessor_kwargs: null
  state_preprocessor: null
  state_preprocessor_kwargs: null
  value_preprocessor: RunningStandardScaler
  value_preprocessor_kwargs: null
  random_timesteps: 0
  learning_starts: 0
  grad_norm_clip: 1.0
  ratio_clip: 0.2
  value_clip: 0.2
  entropy_loss_scale: 0.0
  value_loss_scale: 2.0
  kl_threshold: 0.0
  time_limit_bootstrap: False
  rewards_shaper_scale: 0.1
  # logging and checkpoint
  experiment:
    directory: "cartpole_direct"
    experiment_name: ""
    write_interval: auto
    checkpoint_interval: auto


# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
  class: SequentialTrainer
  timesteps: 4800
  environment_info: log

from skrl.utils.runner.warp import Runner
from skrl.envs.wrappers.warp import wrap_env

# load and wrap some environment
env = ...
env = wrap_env(env)

# load the experiment config and instantiate the runner
cfg = Runner.load_cfg_from_yaml("path/to/cfg.yaml")
runner = Runner(env, cfg)

# load a checkpoint to continue training or for evaluation (optional)
runner.agent.load("path/to/checkpoints/agent.pickle")

# run the training
runner.run("train")  # or "eval" for evaluation

seed: 42


# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
models:
  separate: False
  policy:  # see gaussian_model parameters
    class: GaussianMixin
    clip_actions: False
    clip_log_std: True
    min_log_std: -20.0
    max_log_std: 2.0
    initial_log_std: 0.0
    network:
      - name: net
        input: OBSERVATIONS
        layers: [32, 32]
        activations: elu
    output: ACTIONS
  value:  # see deterministic_model parameters
    class: DeterministicMixin
    clip_actions: False
    network:
      - name: net
        input: OBSERVATIONS
        layers: [32, 32]
        activations: elu
    output: ONE


# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
  class: RandomMemory
  memory_size: -1  # automatically determined (same as agent:rollouts)


# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
  class: PPO
  rollouts: 32
  learning_epochs: 8
  mini_batches: 8
  discount_factor: 0.99
  gae_lambda: 0.95
  learning_rate: 5.0e-04
  learning_rate_scheduler: KLAdaptiveLR
  learning_rate_scheduler_kwargs:
    kl_threshold: 0.008
  observation_preprocessor: RunningStandardScaler
  observation_preprocessor_kwargs: null
  state_preprocessor: null
  state_preprocessor_kwargs: null
  value_preprocessor: RunningStandardScaler
  value_preprocessor_kwargs: null
  random_timesteps: 0
  learning_starts: 0
  grad_norm_clip: 1.0
  ratio_clip: 0.2
  value_clip: 0.2
  entropy_loss_scale: 0.0
  value_loss_scale: 2.0
  kl_threshold: 0.0
  time_limit_bootstrap: False
  rewards_shaper_scale: 0.1
  # logging and checkpoint
  experiment:
    directory: "cartpole_direct"
    experiment_name: ""
    write_interval: auto
    checkpoint_interval: auto


# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
  class: SequentialTrainer
  timesteps: 4800
  environment_info: log

API¶

PyTorch¶

Runner

Experiment runner.

class skrl.utils.runner.torch.Runner(env: Wrapper | MultiAgentEnvWrapper, cfg: dict[str, Any], *, verbose: bool = False)[source]¶

Bases: object

Experiment runner.

Configure and instantiate skrl components to execute training/evaluation workflows in a few lines of code.

Parameters:

env – Environment to train on.
cfg – Runner configuration.
verbose – Whether to print extra information about the setup.

Methods:

`load_cfg_from_yaml`(path)	Load a runner configuration from a yaml file.
`run`([mode])	Run the training/evaluation.

Attributes:

`agent`	Agent instance.
`trainer`	Trainer instance.

static load_cfg_from_yaml(path: str) → dict[source]¶

Load a runner configuration from a yaml file.

Parameters:: path – File path.
Returns:: Loaded configuration, or an empty dict if an error has occurred.

run(mode: Literal['train', 'eval'] = 'train') → None[source]¶

Run the training/evaluation.

Parameters:: mode – Running mode: "train" for training or "eval" for evaluation.
Raises:: ValueError – The specified running mode is not valid.

property agent: Agent[source]¶: Agent instance.

property trainer: Trainer[source]¶: Trainer instance.

JAX¶

Runner

Experiment runner.

class skrl.utils.runner.jax.Runner(env: Wrapper | MultiAgentEnvWrapper, cfg: dict[str, Any], *, verbose: bool = False)[source]¶

Bases: object

Experiment runner.

Configure and instantiate skrl components to execute training/evaluation workflows in a few lines of code.

Parameters:

env – Environment to train on.
cfg – Runner configuration.
verbose – Whether to print extra information about the setup.

Methods:

`load_cfg_from_yaml`(path)	Load a runner configuration from a yaml file.
`run`([mode])	Run the training/evaluation.

Attributes:

`agent`	Agent instance.
`trainer`	Trainer instance.

static load_cfg_from_yaml(path: str) → dict[source]¶

Load a runner configuration from a yaml file.

Parameters:: path – File path.
Returns:: Loaded configuration, or an empty dict if an error has occurred.

run(mode: Literal['train', 'eval'] = 'train') → None[source]¶

Run the training/evaluation.

Parameters:: mode – Running mode: "train" for training or "eval" for evaluation.
Raises:: ValueError – The specified running mode is not valid.

property agent: Agent[source]¶: Agent instance.

property trainer: Trainer[source]¶: Trainer instance.

Warp¶

Runner

Experiment runner.