Parallel trainer

Train agents in parallel using multiple processes.



Concept

Parallel trainer Parallel trainer

Usage

Note

Each process adds a GPU memory overhead (~1GB, although it can be much higher) due to PyTorch’s CUDA kernels. See PyTorch Issue #12873 for more details.

Note

At the moment, only simultaneous training and evaluation of agents with local memory (no memory sharing) is implemented.

from skrl.trainers.torch import ParallelTrainer

# assuming there is an environment called 'env'
# and an agent or a list of agents called 'agents'

# create a sequential trainer
cfg = {"timesteps": 50000, "headless": False}
trainer = ParallelTrainer(env=env, agents=agents, cfg=cfg)

# train the agent(s)
trainer.train()

# evaluate the agent(s)
trainer.eval()

Configuration

Dataclass

    pytorch    

    jax    

    warp    

ParallelTrainerCfg

ParallelTrainerCfg


API


PyTorch

ParallelTrainerCfg

Configuration for the parallel trainer.

ParallelTrainer

Parallel trainer.

class skrl.trainers.torch.parallel.ParallelTrainerCfg(*, timesteps: int = 100000, headless: bool = False, render_interval: int = 1, disable_progressbar: bool | None = False, close_environment_at_exit: bool = True, environment_info: str = 'episode', stochastic_evaluation: bool = False)[source]

Bases: TrainerCfg

Configuration for the parallel trainer.

Methods:

expand()

Expand the configuration.

validate()

Validate the configuration.

Attributes:

close_environment_at_exit

Whether to close the environment on normal program termination.

disable_progressbar

Whether to disable the progressbar.

environment_info

Key used to get and log environment info.

headless

Whether to run in headless mode (do not call env.render()).

render_interval

Interval (in timesteps) for rendering the environments.

stochastic_evaluation

Whether to use actions rather than (deterministic) mean actions during evaluation.

timesteps

Number of timesteps to train/evaluate for.

expand() None[source]

Expand the configuration.

validate() bool[source]

Validate the configuration.

close_environment_at_exit: bool = True

Whether to close the environment on normal program termination.

disable_progressbar: bool | None = False

Whether to disable the progressbar. If None, disable on non-TTY.

environment_info: str = 'episode'

Key used to get and log environment info.

headless: bool = False

Whether to run in headless mode (do not call env.render()).

render_interval: int = 1

Interval (in timesteps) for rendering the environments. Only effective if headless is False.

stochastic_evaluation: bool = False

Whether to use actions rather than (deterministic) mean actions during evaluation.

timesteps: int = 100000

Number of timesteps to train/evaluate for.

class skrl.trainers.torch.parallel.ParallelTrainer(*, env: Wrapper | MultiAgentEnvWrapper, agents: Agent | MultiAgent | list[Agent] | list[MultiAgent], scopes: list[int] | None = None, cfg: ParallelTrainerCfg | dict = {})[source]

Bases: Trainer

Parallel trainer.

Train agents in parallel using multiple processes.

Parameters:
  • env – Environment to train/evaluate on.

  • agents – Agent(s) to train/evaluate.

  • scopes – Number of environments for each simultaneous agent to train/evaluate on.

  • cfg – Configuration dictionary.

Methods:

eval()

Evaluate agents sequentially.

train()

Train agents in parallel.

eval() None[source]

Evaluate agents sequentially.

This method executes the following steps in loop:

  • Pre-interaction (in parallel)

  • Compute actions (in parallel)

  • Interact with the environments

  • Render environments

  • Record transitions (in parallel)

  • Reset environments

train() None[source]

Train agents in parallel.

This method executes the following steps in loop:

  • Pre-interaction (parallel)

  • Compute actions (in parallel)

  • Interact with the environments

  • Render environments

  • Record transitions (in parallel)

  • Post-interaction (in parallel)

  • Reset environments