Parallel trainer¶
Train agents in parallel using multiple processes.
Concept¶
Usage¶
Note
Each process adds a GPU memory overhead (~1GB, although it can be much higher) due to PyTorch’s CUDA kernels. See PyTorch Issue #12873 for more details.
Note
At the moment, only simultaneous training and evaluation of agents with local memory (no memory sharing) is implemented.
from skrl.trainers.torch import ParallelTrainer
# assuming there is an environment called 'env'
# and an agent or a list of agents called 'agents'
# create a sequential trainer
cfg = {"timesteps": 50000, "headless": False}
trainer = ParallelTrainer(env=env, agents=agents, cfg=cfg)
# train the agent(s)
trainer.train()
# evaluate the agent(s)
trainer.eval()
Configuration¶
Dataclass |
|
|
|
|---|---|---|---|
|
API¶
PyTorch¶
Configuration for the parallel trainer. |
|
Parallel trainer. |
- class skrl.trainers.torch.parallel.ParallelTrainerCfg(*, timesteps: int = 100000, headless: bool = False, render_interval: int = 1, disable_progressbar: bool | None = False, close_environment_at_exit: bool = True, environment_info: str = 'episode', stochastic_evaluation: bool = False)[source]¶
Bases:
TrainerCfgConfiguration for the parallel trainer.
Methods:
Attributes:
Whether to close the environment on normal program termination.
Whether to disable the progressbar.
Key used to get and log environment info.
Whether to run in headless mode (do not call
env.render()).Interval (in timesteps) for rendering the environments.
Whether to use actions rather than (deterministic) mean actions during evaluation.
Number of timesteps to train/evaluate for.
- close_environment_at_exit: bool = True¶
Whether to close the environment on normal program termination.
- disable_progressbar: bool | None = False¶
Whether to disable the progressbar. If None, disable on non-TTY.
- render_interval: int = 1¶
Interval (in timesteps) for rendering the environments. Only effective if
headlessis False.
- class skrl.trainers.torch.parallel.ParallelTrainer(*, env: Wrapper | MultiAgentEnvWrapper, agents: Agent | MultiAgent | list[Agent] | list[MultiAgent], scopes: list[int] | None = None, cfg: ParallelTrainerCfg | dict = {})[source]¶
Bases:
TrainerParallel trainer.
Train agents in parallel using multiple processes.
- Parameters:
env – Environment to train/evaluate on.
agents – Agent(s) to train/evaluate.
scopes – Number of environments for each simultaneous agent to train/evaluate on.
cfg – Configuration dictionary.
Methods: