Examples#
In this section, you will find a variety of examples that demonstrate how to use this library to solve reinforcement learning tasks. With the knowledge and skills you gain from trying these examples, you will be well on your way to using this library to solve your reinforcement learning problems.
Note
It is recommended to use the table of contents in the right sidebar for better navigation.
Gymnasium / Gym#
Gymnasium / Gym environments#
Training/evaluation of an agent in Gymnasium / Gym environments (one agent, one environment)

Benchmark results are listed in Benchmark results #32 (Gymnasium/Gym)
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
CartPole |
||
FrozenLake |
||
Pendulum |
|
|
PendulumNoVel*
|
|
|
Taxi |
Note
(*) The examples use a wrapper around the original environment to mask the velocity in the observation. The intention is to make the MDP partially observable and to show the capabilities of recurrent neural networks
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
CartPole |
||
FrozenLake |
||
Pendulum |
|
|
PendulumNoVel*
|
|
|
Taxi |
Note
(*) The examples use a wrapper around the original environment to mask the velocity in the observation. The intention is to make the MDP partially observable and to show the capabilities of recurrent neural networks
Gymnasium / Gym vectorized environments#
Training/evaluation of an agent in Gymnasium / Gym vectorized environments (one agent, multiple independent copies of the same environment in parallel)
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
CartPole |
||
FrozenLake |
||
Pendulum |
||
Taxi |
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
CartPole |
||
FrozenLake |
||
Pendulum |
||
Taxi |
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
CartPole |
||
FrozenLake |
||
Pendulum |
||
Taxi |
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
CartPole |
||
FrozenLake |
||
Pendulum |
||
Taxi |
Shimmy (API conversion)#
The following examples show the training in several popular environments (Atari, DeepMind Control and OpenAI Gym) that have been converted to the Gymnasium API using the Shimmy (API conversion tool) package

Note
From skrl, no extra implementation is necessary, since it fully supports Gymnasium API
Note
Because the Gymnasium API requires that the rendering mode be specified during the initialization of the environment, it is not enough to set the headless
option in the trainer configuration to render the environment. In this case, it is necessary to call the gymnasium.make
function using render_mode="human"
or any other supported option
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
Atari: Pong |
||
DeepMind: Acrobot |
||
Gym-v21 compatibility |
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
Atari: Pong |
||
DeepMind: Acrobot |
||
Gym-v21 compatibility |
Other supported APIs#
DeepMind environments#
These examples perform the training of one agent in a DeepMind environment (one agent, one environment)

Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
Control: Cartpole SwingUp |
||
Manipulation: Reach Site Vision |
Robosuite environments#
These examples perform the training of one agent in a robosuite environment (one agent, one environment)

Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
TwoArmLift |
Bi-DexHands environments#
Multi-agent training/evaluation in a Bi-DexHands environment

Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
ShadowHandOver |
|
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
ShadowHandOver |
|
NVIDIA Isaac Gym preview#
Isaac Gym environments#
Training/evaluation of an agent in Isaac Gym environments (one agent, multiple environments)

The agent configuration is mapped, as far as possible, from the IsaacGymEnvs configuration for rl_games. Shared models or separated models are used depending on the value of the network.separate
variable. The following list shows the mapping between the two configurations:
# memory
memory_size = horizon_length
# agent
rollouts = horizon_length
learning_epochs = mini_epochs
mini_batches = horizon_length * num_actors / minibatch_size
discount_factor = gamma
lambda = tau
learning_rate = learning_rate
learning_rate_scheduler = skrl.resources.schedulers.torch.KLAdaptiveLR
learning_rate_scheduler_kwargs = {"kl_threshold": kl_threshold}
random_timesteps = 0
learning_starts = 0
grad_norm_clip = grad_norm # if truncate_grads else 0
ratio_clip = e_clip
value_clip = e_clip
clip_predicted_values = clip_value
entropy_loss_scale = entropy_coef
value_loss_scale = 0.5 * critic_coef
kl_threshold = 0
rewards_shaper = lambda rewards, timestep, timesteps: rewards * scale_value
# trainer
timesteps = horizon_length * max_epochs
# memory
memory_size = replay_buffer_size / num_envs
# agent
gradient_steps = 1
batch_size = batch_size
discount_factor = gamma
polyak = critic_tau
actor_learning_rate = actor_lr
critic_learning_rate = critic_lr
random_timesteps = num_warmup_steps * num_steps_per_episode
learning_starts = num_warmup_steps * num_steps_per_episode
grad_norm_clip = 0
learn_entropy = learnable_temperature
entropy_learning_rate = alpha_lr
initial_entropy_value = init_alpha
target_entropy = None
rewards_shaper = lambda rewards, timestep, timesteps: rewards * scale_value
# trainer
timesteps = num_steps_per_episode * max_epochs
Benchmark results are listed in Benchmark results #32 (NVIDIA Isaac Gym)
Note
Isaac Gym environments implement a functionality to get their configuration from the command line. Because of this feature, setting the headless
option from the trainer configuration will not work. In this case, it is necessary to invoke the scripts as follows: python script.py headless=True
for Isaac Gym environments (preview 3 and preview 4) or python script.py --headless
for Isaac Gym environments (preview 2)
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
AllegroHand |
||
Ant |
|
|
Anymal |
||
AnymalTerrain |
||
BallBalance |
||
Cartpole |
||
FactoryTaskNutBoltPick |
||
FactoryTaskNutBoltPlace |
||
FactoryTaskNutBoltScrew |
||
FrankaCabinet |
||
FrankaCubeStack |
||
Humanoid |
||
Humanoid-AMP |
||
Ingenuity |
||
Quadcopter |
||
ShadowHand |
||
Trifinger |
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
AllegroHand |
||
Ant |
|
|
Anymal |
||
AnymalTerrain |
||
BallBalance |
||
Cartpole |
||
FactoryTaskNutBoltPick |
||
FactoryTaskNutBoltPlace |
||
FactoryTaskNutBoltScrew |
||
FrankaCabinet |
||
FrankaCubeStack |
||
Humanoid |
||
Humanoid-AMP |
||
Ingenuity |
||
Quadcopter |
||
ShadowHand |
||
Trifinger |
NVIDIA Isaac Orbit#
Isaac Orbit environments#
Training/evaluation of an agent in Isaac Orbit environments (one agent, multiple environments)

The agent configuration is mapped, as far as possible, from the Isaac Orbit configuration for rl_games. Shared models or separated models are used depending on the value of the network.separate
variable. The following list shows the mapping between the two configurations:
# memory
memory_size = horizon_length
# agent
rollouts = horizon_length
learning_epochs = mini_epochs
mini_batches = horizon_length * num_actors / minibatch_size
discount_factor = gamma
lambda = tau
learning_rate = learning_rate
learning_rate_scheduler = skrl.resources.schedulers.torch.KLAdaptiveLR
learning_rate_scheduler_kwargs = {"kl_threshold": kl_threshold}
random_timesteps = 0
learning_starts = 0
grad_norm_clip = grad_norm # if truncate_grads else 0
ratio_clip = e_clip
value_clip = e_clip
clip_predicted_values = clip_value
entropy_loss_scale = entropy_coef
value_loss_scale = 0.5 * critic_coef
kl_threshold = 0
rewards_shaper = lambda rewards, timestep, timesteps: rewards * scale_value
# trainer
timesteps = horizon_length * max_epochs
# memory
memory_size = replay_buffer_size / num_envs
# agent
gradient_steps = 1
batch_size = batch_size
discount_factor = gamma
polyak = critic_tau
actor_learning_rate = actor_lr
critic_learning_rate = critic_lr
random_timesteps = num_warmup_steps * num_steps_per_episode
learning_starts = num_warmup_steps * num_steps_per_episode
grad_norm_clip = 0
learn_entropy = learnable_temperature
entropy_learning_rate = alpha_lr
initial_entropy_value = init_alpha
target_entropy = None
rewards_shaper = lambda rewards, timestep, timesteps: rewards * scale_value
# trainer
timesteps = num_steps_per_episode * max_epochs
Benchmark results are listed in Benchmark results #32 (NVIDIA Isaac Orbit)
Note
Isaac Orbit environments implement a functionality to get their configuration from the command line. Because of this feature, setting the headless
option from the trainer configuration will not work. In this case, it is necessary to invoke the scripts as follows: orbit -p script.py --headless
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
Isaac-Ant-v0 |
|
|
Isaac-Cartpole-v0 |
||
Isaac-Humanoid-v0 |
||
Isaac-Lift-Franka-v0 |
||
Isaac-Reach-Franka-v0 |
||
Isaac-Velocity-Anymal-C-v0 |
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
Isaac-Ant-v0 |
|
|
Isaac-Cartpole-v0 |
||
Isaac-Humanoid-v0 |
||
Isaac-Lift-Franka-v0 |
||
Isaac-Reach-Franka-v0 |
||
Isaac-Velocity-Anymal-C-v0 |
NVIDIA Omniverse Isaac Gym#
Omniverse Isaac Gym environments (OIGE)#
Training/evaluation of an agent in Omniverse Isaac Gym environments (OIGE) (one agent, multiple environments)

The agent configuration is mapped, as far as possible, from the OmniIsaacGymEnvs configuration for rl_games. Shared models or separated models are used depending on the value of the network.separate
variable. The following list shows the mapping between the two configurations:
# memory
memory_size = horizon_length
# agent
rollouts = horizon_length
learning_epochs = mini_epochs
mini_batches = horizon_length * num_actors / minibatch_size
discount_factor = gamma
lambda = tau
learning_rate = learning_rate
learning_rate_scheduler = skrl.resources.schedulers.torch.KLAdaptiveLR
learning_rate_scheduler_kwargs = {"kl_threshold": kl_threshold}
random_timesteps = 0
learning_starts = 0
grad_norm_clip = grad_norm # if truncate_grads else 0
ratio_clip = e_clip
value_clip = e_clip
clip_predicted_values = clip_value
entropy_loss_scale = entropy_coef
value_loss_scale = 0.5 * critic_coef
kl_threshold = 0
rewards_shaper = lambda rewards, timestep, timesteps: rewards * scale_value
# trainer
timesteps = horizon_length * max_epochs
# memory
memory_size = replay_buffer_size / num_envs
# agent
gradient_steps = 1
batch_size = batch_size
discount_factor = gamma
polyak = critic_tau
actor_learning_rate = actor_lr
critic_learning_rate = critic_lr
random_timesteps = num_warmup_steps * num_steps_per_episode
learning_starts = num_warmup_steps * num_steps_per_episode
grad_norm_clip = 0
learn_entropy = learnable_temperature
entropy_learning_rate = alpha_lr
initial_entropy_value = init_alpha
target_entropy = None
rewards_shaper = lambda rewards, timestep, timesteps: rewards * scale_value
# trainer
timesteps = num_steps_per_episode * max_epochs
Benchmark results are listed in Benchmark results #32 (NVIDIA Omniverse Isaac Gym)
Note
Omniverse Isaac Gym environments implement a functionality to get their configuration from the command line. Because of this feature, setting the headless
option from the trainer configuration will not work. In this case, it is necessary to invoke the scripts as follows: python script.py headless=True
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
AllegroHand |
||
Ant |
|
|
Ant (multi-threaded) |
||
Anymal |
||
AnymalTerrain |
||
BallBalance |
||
Cartpole |
||
Cartpole (multi-threaded) |
||
Crazyflie |
||
FactoryTaskNutBoltPick |
||
FrankaCabinet |
||
Humanoid |
||
Ingenuity |
||
Quadcopter |
||
ShadowHand |
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
AllegroHand |
||
Ant |
|
|
Ant (multi-threaded) |
||
Anymal |
||
AnymalTerrain |
||
BallBalance |
||
Cartpole |
||
Cartpole (multi-threaded) |
||
Crazyflie |
||
FactoryTaskNutBoltPick |
||
FrankaCabinet |
||
Humanoid |
||
Ingenuity |
||
Quadcopter |
||
ShadowHand |
Omniverse Isaac Gym environments (simultaneous learning by scope)#
Simultaneous training/evaluation by scopes (subsets of environments among all available environments) of several agents in the same run in OIGE’s Ant environment (multiple agents and environments)

Three cases are presented:
Simultaneous (sequential) training of agents that share the same memory and whose scopes are automatically selected to be as equal as possible.
Simultaneous (sequential) training of agents without sharing memory and whose scopes are specified manually.
Simultaneous (parallel) training of agents without sharing memory and whose scopes are specified manually.
Note
Omniverse Isaac Gym environments implement a functionality to get their configuration from the command line. Because of this feature, setting the headless
option from the trainer configuration will not work. In this case, it is necessary to invoke the scripts as follows: python script.py headless=True
Type |
Script |
---|---|
Sequential training (shared memory) |
|
Sequential training (unshared memory) |
|
Parallel training (unshared memory) |
Omniverse Isaac Sim (single environment)#
Training/evaluation of an agent in Omniverse Isaac Sim environment implemented using the Gym interface (one agent, one environment)
This example performs the training of an agent in the Isaac Sim’s Cartpole environment described in the Creating New RL Environment tutorial
Use the steps described below to setup and launch the experiment after follow the tutorial
# download the sample code from GitHub in the directory containing the cartpole_task.py script
wget https://raw.githubusercontent.com/Toni-SM/skrl/main/docs/source/examples/isaacsim/torch_isaacsim_cartpole_ppo.py
# run the experiment
PYTHON_PATH torch_isaacsim_cartpole_ppo.py
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
Cartpole |
This example performs the training of an agent in the Isaac Sim’s JetBot environment. The following components or practices are exemplified (highlighted):
Define and instantiate Convolutional Neural Networks (CNN) to learn from 128 X 128 RGB images
Use the steps described below (for a local workstation or a remote container) to setup and launch the experiment
# create a working directory and change to it
mkdir ~/.local/share/ov/pkg/isaac_sim-2021.2.1/standalone_examples/api/omni.isaac.jetbot/skrl_example
cd ~/.local/share/ov/pkg/isaac_sim-2021.2.1/standalone_examples/api/omni.isaac.jetbot/skrl_example
# install the skrl library in editable mode from the working directory
~/.local/share/ov/pkg/isaac_sim-2021.2.1/python.sh -m pip install -e git+https://github.com/Toni-SM/skrl.git#egg=skrl
# download the sample code from GitHub
wget https://raw.githubusercontent.com/Toni-SM/skrl/main/docs/source/examples/isaacsim/torch_isaacsim_jetbot_ppo.py
# copy the Isaac Sim sample environment (JetBotEnv) to the working directory
cp ../stable_baselines_example/env.py .
# run the experiment
~/.local/share/ov/pkg/isaac_sim-2021.2.1/python.sh torch_isaacsim_jetbot_ppo.py
# create a working directory and change to it
mkdir /isaac-sim/standalone_examples/api/omni.isaac.jetbot/skrl_example
cd /isaac-sim/standalone_examples/api/omni.isaac.jetbot/skrl_example
# install the skrl library in editable mode from the working directory
/isaac-sim/kit/python/bin/python3 -m pip install -e git+https://github.com/Toni-SM/skrl.git#egg=skrl
# download the sample code from GitHub
wget https://raw.githubusercontent.com/Toni-SM/skrl/main/docs/source/examples/isaacsim/torch_isaacsim_jetbot_ppo.py
# copy the Isaac Sim sample environment (JetBotEnv) to the working directory
cp ../stable_baselines_example/env.py .
# run the experiment
/isaac-sim/python.sh torch_isaacsim_jetbot_ppo.py
Environment |
Script |
Checkpoint (Hugging Face) |
---|---|---|
JetBot |
Real-world examples#
These examples show basic real-world and sim2real use cases to guide and support advanced RL implementations
3D reaching task (Franka’s gripper must reach a certain target point in space). The training was done in Omniverse Isaac Gym. The real robot control is performed through the Python API of a modified version of frankx (see frankx’s pull request #44), a high-level motion library around libfranka. Training and evaluation is performed for both Cartesian and joint control space
Implementation (see details in the table below):
The observation space is composed of the episode’s normalized progress, the robot joints’ normalized positions (\(q\)) in the interval -1 to 1, the robot joints’ velocities (\(\dot{q}\)) affected by a random uniform scale for generalization, and the target’s position in space (\(target_{_{XYZ}}\)) with respect to the robot’s base
The action space, bounded in the range -1 to 1, consists of the following. For the joint control it’s robot joints’ position scaled change. For the Cartesian control it’s the end-effector’s position (\(ee_{_{XYZ}}\)) scaled change. The end-effector position frame corresponds to the point where the left finger connects to the gripper base in simulation, whereas in the real world it corresponds to the end of the fingers. The gripper fingers remain closed all the time in both cases
The instantaneous reward is the negative value of the Euclidean distance (\(\text{d}\)) between the robot end-effector and the target point position. The episode terminates when this distance is less than 0.035 meters in simulation (0.075 meters in real-world) or when the defined maximum timestep is reached
The target position lies within a rectangular cuboid of dimensions 0.5 x 0.5 x 0.2 meters centered at (0.5, 0.0, 0.2) meters with respect to the robot’s base. The robot joints’ positions are drawn from an initial configuration [0º, -45º, 0º, -135º, 0º, 90º, 45º] modified with uniform random values between -7º and 7º approximately
Variable |
Formula / value |
Size |
---|---|---|
Observation space |
\(\dfrac{t}{t_{max}},\; 2 \dfrac{q - q_{min}}{q_{max} - q_{min}} - 1,\; 0.1\,\dot{q}\,U(0.5,1.5),\; target_{_{XYZ}}\) |
18 |
Action space (joint) |
\(\dfrac{2.5}{120} \, \Delta q\) |
7 |
Action space (Cartesian) |
\(\dfrac{1}{100} \, \Delta ee_{_{XYZ}}\) |
3 |
Reward |
\(-\text{d}(ee_{_{XYZ}},\; target_{_{XYZ}})\) |
|
Episode termination |
\(\text{d}(ee_{_{XYZ}},\; target_{_{XYZ}}) \le 0.035 \quad\) or \(\quad t \ge t_{max} - 1\) |
|
Maximum timesteps (\(t_{max}\)) |
100 |
Workflows:
Warning
Make sure you have the e-stop on hand in case something goes wrong in the run. Control via RL can be dangerous and unsafe for both the operator and the robot
Target position in X and Y obtained with a USB-camera (position in Z fixed at 0.2 m)
Prerequisites:
A physical Franka Emika Panda robot with Franka Control Interface (FCI) is required. Additionally, the frankx library must be available in the python environment (see frankx’s pull request #44 for the RL-compatible version installation)
Files
Environment:
reaching_franka_real_env.py
Evaluation script:
reaching_franka_real_skrl_eval.py
Checkpoints (
agent_joint.pt
,agent_cartesian.pt
):trained_checkpoints.zip
Evaluation:
python3 reaching_franka_real_skrl_eval.py
Main environment configuration:
Note
In the joint control space the final control of the robot is performed through the Cartesian pose (forward kinematics from specified values for the joints)
The control space (Cartesian or joint), the robot motion type (waypoint or impedance) and the target position acquisition (command prompt / automatically generated or USB-camera) can be specified in the environment class constructor (from reaching_franka_real_skrl_eval.py
) as follow:
control_space = "joint" # joint or cartesian
motion_type = "waypoint" # waypoint or impedance
camera_tracking = False # True for USB-camera tracking

Prerequisites:
All installation steps described in Omniverse Isaac Gym’s Overview & Getting Started section must be fulfilled (especially the subsection 1.3. Installing Examples Repository)
Files (the implementation is self-contained so no specific location is required):
Environment:
reaching_franka_omniverse_isaacgym_env.py
Training script:
reaching_franka_omniverse_isaacgym_skrl_train.py
Evaluation script:
reaching_franka_omniverse_isaacgym_skrl_eval.py
Checkpoints (
agent_joint.pt
,agent_cartesian.pt
):trained_checkpoints.zip
Training and evaluation:
# training (local workstation)
~/.local/share/ov/pkg/isaac_sim-*/python.sh reaching_franka_omniverse_isaacgym_skrl_train.py
# training (docker container)
/isaac-sim/python.sh reaching_franka_omniverse_isaacgym_skrl_train.py
# evaluation (local workstation)
~/.local/share/ov/pkg/isaac_sim-*/python.sh reaching_franka_omniverse_isaacgym_skrl_eval.py
# evaluation (docker container)
/isaac-sim/python.sh reaching_franka_omniverse_isaacgym_skrl_eval.py
Main environment configuration:
The control space (Cartesian or joint) can be specified in the task configuration dictionary (from reaching_franka_omniverse_isaacgym_skrl_train.py
) as follow:
TASK_CFG["task"]["env"]["controlSpace"] = "joint" # "joint" or "cartesian"

Prerequisites:
All installation steps described in Isaac Gym’s Installation section must be fulfilled
Files (the implementation is self-contained so no specific location is required):
Environment:
reaching_franka_isaacgym_env.py
Training script:
reaching_franka_isaacgym_skrl_train.py
Evaluation script:
reaching_franka_isaacgym_skrl_eval.py
Training and evaluation:
Note
The checkpoints obtained in Isaac Gym were not evaluated with the real robot. However, they were evaluated in Omniverse Isaac Gym showing successful performance
# training (with the Python virtual environment active)
python reaching_franka_isaacgym_skrl_train.py
# evaluation (with the Python virtual environment active)
python reaching_franka_isaacgym_skrl_eval.py
Main environment configuration:
The control space (Cartesian or joint) can be specified in the task configuration dictionary (from reaching_franka_isaacgym_skrl_train.py
) as follow:
TASK_CFG["env"]["controlSpace"] = "joint" # "joint" or "cartesian"
3D reaching task (iiwa’s end-effector must reach a certain target point in space). The training was done in Omniverse Isaac Gym. The real robot control is performed through the Python, ROS and ROS2 APIs of libiiwa, a scalable multi-control framework for the KUKA LBR Iiwa robots. Training and evaluation is performed for both Cartesian and joint control space
Implementation (see details in the table below):
The observation space is composed of the episode’s normalized progress, the robot joints’ normalized positions (\(q\)) in the interval -1 to 1, the robot joints’ velocities (\(\dot{q}\)) affected by a random uniform scale for generalization, and the target’s position in space (\(target_{_{XYZ}}\)) with respect to the robot’s base
The action space, bounded in the range -1 to 1, consists of the following. For the joint control it’s robot joints’ position scaled change. For the Cartesian control it’s the end-effector’s position (\(ee_{_{XYZ}}\)) scaled change
The instantaneous reward is the negative value of the Euclidean distance (\(\text{d}\)) between the robot end-effector and the target point position. The episode terminates when this distance is less than 0.035 meters in simulation (0.075 meters in real-world) or when the defined maximum timestep is reached
The target position lies within a rectangular cuboid of dimensions 0.2 x 0.4 x 0.4 meters centered at (0.6, 0.0, 0.4) meters with respect to the robot’s base. The robot joints’ positions are drawn from an initial configuration [0º, 0º, 0º, -90º, 0º, 90º, 0º] modified with uniform random values between -7º and 7º approximately
Variable |
Formula / value |
Size |
---|---|---|
Observation space |
\(\dfrac{t}{t_{max}},\; 2 \dfrac{q - q_{min}}{q_{max} - q_{min}} - 1,\; 0.1\,\dot{q}\,U(0.5,1.5),\; target_{_{XYZ}}\) |
18 |
Action space (joint) |
\(\dfrac{2.5}{120} \, \Delta q\) |
7 |
Action space (Cartesian) |
\(\dfrac{1}{100} \, \Delta ee_{_{XYZ}}\) |
3 |
Reward |
\(-\text{d}(ee_{_{XYZ}},\; target_{_{XYZ}})\) |
|
Episode termination |
\(\text{d}(ee_{_{XYZ}},\; target_{_{XYZ}}) \le 0.035 \quad\) or \(\quad t \ge t_{max} - 1\) |
|
Maximum timesteps (\(t_{max}\)) |
100 |
Workflows:
Warning
Make sure you have the smartHMI on hand in case something goes wrong in the run. Control via RL can be dangerous and unsafe for both the operator and the robot
Prerequisites:
A physical Kuka LBR iiwa robot is required. Additionally, the libiiwa library must be installed (visit the libiiwa documentation for installation details)
Files
Environment:
reaching_iiwa_real_env.py
Evaluation script:
reaching_iiwa_real_skrl_eval.py
Checkpoints (
agent_joint.pt
,agent_cartesian.pt
):trained_checkpoints.zip
Evaluation:
python3 reaching_iiwa_real_skrl_eval.py
Main environment configuration:
The control space (Cartesian or joint) can be specified in the environment class constructor (from reaching_iiwa_real_skrl_eval.py
) as follow:
control_space = "joint" # joint or cartesian
Warning
Make sure you have the smartHMI on hand in case something goes wrong in the run. Control via RL can be dangerous and unsafe for both the operator and the robot
Prerequisites:
A physical Kuka LBR iiwa robot is required. Additionally, the libiiwa library must be installed (visit the libiiwa documentation for installation details) and a Robot Operating System (ROS or ROS2) distribution must be available
Files
Environment (ROS):
reaching_iiwa_real_ros_env.py
Environment (ROS2):
reaching_iiwa_real_ros2_env.py
Evaluation script:
reaching_iiwa_real_ros_ros2_skrl_eval.py
Checkpoints (
agent_joint.pt
,agent_cartesian.pt
):trained_checkpoints.zip
Note
Source the ROS/ROS2 distribution and the ROS/ROS workspace containing the libiiwa packages before executing the scripts
Evaluation:
Note
The environment (reaching_iiwa_real_ros_env.py
or reaching_iiwa_real_ros2_env.py
) to be loaded will be automatically selected based on the sourced ROS distribution (ROS or ROS2) at script execution
python3 reaching_iiwa_real_ros_ros2_skrl_eval.py
Main environment configuration:
The control space (Cartesian or joint) can be specified in the environment class constructor (from reaching_iiwa_real_ros_ros2_skrl_eval.py
) as follow:
control_space = "joint" # joint or cartesian

Prerequisites:
All installation steps described in Omniverse Isaac Gym’s Overview & Getting Started section must be fulfilled (especially the subsection 1.3. Installing Examples Repository)
Files (the implementation is self-contained so no specific location is required):
Environment:
reaching_iiwa_omniverse_isaacgym_env.py
Training script:
reaching_iiwa_omniverse_isaacgym_skrl_train.py
Evaluation script:
reaching_iiwa_omniverse_isaacgym_skrl_eval.py
Checkpoints (
agent_joint.pt
,agent_cartesian.pt
):trained_checkpoints.zip
Simulation files: (.usd assets and robot class):
simulation_files.zip
Simulation files must be structured as follows:
<some_folder>
├── agent_cartesian.pt
├── agent_joint.pt
├── assets
│ ├── iiwa14_instanceable_meshes.usd
│ └── iiwa14.usd
├── reaching_iiwa_omniverse_isaacgym_env.py
├── reaching_iiwa_omniverse_isaacgym_skrl_eval.py
├── reaching_iiwa_omniverse_isaacgym_skrl_train.py
├── robots
│ ├── iiwa14.py
│ └── __init__.py
Training and evaluation:
# training (local workstation)
~/.local/share/ov/pkg/isaac_sim-*/python.sh reaching_iiwa_omniverse_isaacgym_skrl_train.py
# training (docker container)
/isaac-sim/python.sh reaching_iiwa_omniverse_isaacgym_skrl_train.py
# evaluation (local workstation)
~/.local/share/ov/pkg/isaac_sim-*/python.sh reaching_iiwa_omniverse_isaacgym_skrl_eval.py
# evaluation (docker container)
/isaac-sim/python.sh reaching_iiwa_omniverse_isaacgym_skrl_eval.py
Main environment configuration:
The control space (Cartesian or joint) can be specified in the task configuration dictionary (from reaching_iiwa_omniverse_isaacgym_skrl_train.py
) as follow:
TASK_CFG["task"]["env"]["controlSpace"] = "joint" # "joint" or "cartesian"
Library utilities (skrl.utils module)#
This example shows how to use the library utilities to carry out the post-processing of files and data generated by the experiments
Example of a figure, generated by the code, showing the total reward (left) and the mean and standard deviation (right) of all experiments located in the runs folder
Note: The code will load all the Tensorboard files of the experiments located in the runs
folder. It is necessary to adjust the iterator’s parameters for other paths
import numpy as np
import matplotlib.pyplot as plt
from skrl.utils import postprocessing
labels = []
rewards = []
# load the Tensorboard files and iterate over them (tag: "Reward / Total reward (mean)")
tensorboard_iterator = postprocessing.TensorboardFileIterator("runs/*/events.out.tfevents.*",
tags=["Reward / Total reward (mean)"])
for dirname, data in tensorboard_iterator:
rewards.append(data["Reward / Total reward (mean)"])
labels.append(dirname)
# convert to numpy arrays and compute mean and std
rewards = np.array(rewards)
mean = np.mean(rewards[:,:,1], axis=0)
std = np.std(rewards[:,:,1], axis=0)
# creae two subplots (one for each reward and one for the mean)
fig, ax = plt.subplots(1, 2, figsize=(15, 5))
# plot the rewards for each experiment
for reward, label in zip(rewards, labels):
ax[0].plot(reward[:,0], reward[:,1], label=label)
ax[0].set_title("Total reward (for each experiment)")
ax[0].set_xlabel("Timesteps")
ax[0].set_ylabel("Reward")
ax[0].grid(True)
ax[0].legend()
# plot the mean and std (across experiments)
ax[1].fill_between(rewards[0,:,0], mean - std, mean + std, alpha=0.5, label="std")
ax[1].plot(rewards[0,:,0], mean, label="mean")
ax[1].set_title("Total reward (mean and std of all experiments)")
ax[1].set_xlabel("Timesteps")
ax[1].set_ylabel("Reward")
ax[1].grid(True)
ax[1].legend()
# show and save the figure
plt.show()
plt.savefig("total_reward.png")