Runner¶
Utility that configures and instantiates skrl’s components to run training/evaluation workflows in a few lines of code.
Usage¶
Hint
The Runner classes encapsulates, and greatly simplifies, the definitions and instantiations
needed to execute RL tasks. However, such simplification hides and makes difficult the modification
and readability of the code (models, agents, etc.).
For more control and readability over the RL system setup refer to the Examples section’s training scripts (recommended!).
from skrl.utils.runner.torch import Runner
from skrl.envs.wrappers.torch import wrap_env
# load and wrap some environment
env = ...
env = wrap_env(env)
# load the experiment config and instantiate the runner
cfg = Runner.load_cfg_from_yaml("path/to/cfg.yaml")
runner = Runner(env, cfg)
# load a checkpoint to continue training or for evaluation (optional)
runner.agent.load("path/to/checkpoints/agent.pt")
# run the training
runner.run("train") # or "eval" for evaluation
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
gae_lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
observation_preprocessor: RunningStandardScaler
observation_preprocessor_kwargs: null
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
time_limit_bootstrap: False
rewards_shaper_scale: 0.1
# logging and checkpoint
experiment:
directory: "cartpole_direct"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
from skrl.utils.runner.jax import Runner
from skrl.envs.wrappers.jax import wrap_env
# load and wrap some environment
env = ...
env = wrap_env(env)
# load the experiment config and instantiate the runner
cfg = Runner.load_cfg_from_yaml("path/to/cfg.yaml")
runner = Runner(env, cfg)
# load a checkpoint to continue training or for evaluation (optional)
runner.agent.load("path/to/checkpoints/agent.pickle")
# run the training
runner.run("train") # or "eval" for evaluation
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
gae_lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
observation_preprocessor: RunningStandardScaler
observation_preprocessor_kwargs: null
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
time_limit_bootstrap: False
rewards_shaper_scale: 0.1
# logging and checkpoint
experiment:
directory: "cartpole_direct"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
from skrl.utils.runner.warp import Runner
from skrl.envs.wrappers.warp import wrap_env
# load and wrap some environment
env = ...
env = wrap_env(env)
# load the experiment config and instantiate the runner
cfg = Runner.load_cfg_from_yaml("path/to/cfg.yaml")
runner = Runner(env, cfg)
# load a checkpoint to continue training or for evaluation (optional)
runner.agent.load("path/to/checkpoints/agent.pickle")
# run the training
runner.run("train") # or "eval" for evaluation
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
gae_lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
observation_preprocessor: RunningStandardScaler
observation_preprocessor_kwargs: null
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
time_limit_bootstrap: False
rewards_shaper_scale: 0.1
# logging and checkpoint
experiment:
directory: "cartpole_direct"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
API¶
PyTorch¶
Experiment runner. |
- class skrl.utils.runner.torch.Runner(env: Wrapper | MultiAgentEnvWrapper, cfg: dict[str, Any], *, verbose: bool = False)[source]¶
Bases:
objectExperiment runner.
Configure and instantiate skrl components to execute training/evaluation workflows in a few lines of code.
- Parameters:
env – Environment to train on.
cfg – Runner configuration.
verbose – Whether to print extra information about the setup.
Methods:
load_cfg_from_yaml(path)Load a runner configuration from a yaml file.
run([mode])Run the training/evaluation.
Attributes:
- static load_cfg_from_yaml(path: str) dict[source]¶
Load a runner configuration from a yaml file.
- Parameters:
path – File path.
- Returns:
Loaded configuration, or an empty dict if an error has occurred.
- run(mode: Literal['train', 'eval'] = 'train') None[source]¶
Run the training/evaluation.
- Parameters:
mode – Running mode:
"train"for training or"eval"for evaluation.- Raises:
ValueError – The specified running mode is not valid.
JAX¶
Experiment runner. |
- class skrl.utils.runner.jax.Runner(env: Wrapper | MultiAgentEnvWrapper, cfg: dict[str, Any], *, verbose: bool = False)[source]¶
Bases:
objectExperiment runner.
Configure and instantiate skrl components to execute training/evaluation workflows in a few lines of code.
- Parameters:
env – Environment to train on.
cfg – Runner configuration.
verbose – Whether to print extra information about the setup.
Methods:
load_cfg_from_yaml(path)Load a runner configuration from a yaml file.
run([mode])Run the training/evaluation.
Attributes:
- static load_cfg_from_yaml(path: str) dict[source]¶
Load a runner configuration from a yaml file.
- Parameters:
path – File path.
- Returns:
Loaded configuration, or an empty dict if an error has occurred.
- run(mode: Literal['train', 'eval'] = 'train') None[source]¶
Run the training/evaluation.
- Parameters:
mode – Running mode:
"train"for training or"eval"for evaluation.- Raises:
ValueError – The specified running mode is not valid.
Warp¶
Experiment runner. |