Sequential trainer

Train agents sequentially (i.e., one after the other in each interaction with the environment).



Concept

Sequential trainer Sequential trainer

Usage

from skrl.trainers.torch import SequentialTrainer

# assuming there is an environment called 'env'
# and an agent or a list of agents called 'agents'

# create a sequential trainer
cfg = {"timesteps": 50000, "headless": False}
trainer = SequentialTrainer(env=env, agents=agents, cfg=cfg)

# train the agent(s)
trainer.train()

# evaluate the agent(s)
trainer.eval()

Configuration

Dataclass

    pytorch    

    jax    

    warp    

SequentialTrainerCfg

SequentialTrainerCfg

SequentialTrainerCfg

SequentialTrainerCfg


API


PyTorch

SequentialTrainerCfg

Configuration for the sequential trainer.

SequentialTrainer

Sequential trainer.

class skrl.trainers.torch.sequential.SequentialTrainerCfg(*, timesteps: int = 100000, headless: bool = False, render_interval: int = 1, disable_progressbar: bool | None = False, close_environment_at_exit: bool = True, environment_info: str = 'episode', stochastic_evaluation: bool = False)[source]

Bases: TrainerCfg

Configuration for the sequential trainer.

Methods:

expand()

Expand the configuration.

validate()

Validate the configuration.

Attributes:

close_environment_at_exit

Whether to close the environment on normal program termination.

disable_progressbar

Whether to disable the progressbar.

environment_info

Key used to get and log environment info.

headless

Whether to run in headless mode (do not call env.render()).

render_interval

Interval (in timesteps) for rendering the environments.

stochastic_evaluation

Whether to use actions rather than (deterministic) mean actions during evaluation.

timesteps

Number of timesteps to train/evaluate for.

expand() None[source]

Expand the configuration.

validate() bool[source]

Validate the configuration.

close_environment_at_exit: bool = True

Whether to close the environment on normal program termination.

disable_progressbar: bool | None = False

Whether to disable the progressbar. If None, disable on non-TTY.

environment_info: str = 'episode'

Key used to get and log environment info.

headless: bool = False

Whether to run in headless mode (do not call env.render()).

render_interval: int = 1

Interval (in timesteps) for rendering the environments. Only effective if headless is False.

stochastic_evaluation: bool = False

Whether to use actions rather than (deterministic) mean actions during evaluation.

timesteps: int = 100000

Number of timesteps to train/evaluate for.

class skrl.trainers.torch.sequential.SequentialTrainer(*, env: Wrapper | MultiAgentEnvWrapper, agents: Agent | MultiAgent | list[Agent] | list[MultiAgent], scopes: list[int] | None = None, cfg: SequentialTrainerCfg | dict = {})[source]

Bases: Trainer

Sequential trainer.

Train agents sequentially, i.e., one after the other, in each interaction with the environment.

Parameters:
  • env – Environment to train/evaluate on.

  • agents – Agent(s) to train/evaluate.

  • scopes – Number of environments for each simultaneous agent to train/evaluate on.

  • cfg – Configuration dictionary.

Methods:

eval()

Evaluate agents sequentially.

train()

Train agents sequentially.

eval() None[source]

Evaluate agents sequentially.

This method executes the following steps in loop:

  • Pre-interaction

  • Compute actions (sequentially)

  • Interact with the environments

  • Render environments

  • Record transitions

  • Reset environments

train() None[source]

Train agents sequentially.

This method executes the following steps in loop:

  • Pre-interaction (sequentially)

  • Compute actions (sequentially)

  • Interact with the environments

  • Render environments

  • Record transitions (sequentially)

  • Post-interaction (sequentially)

  • Reset environments


JAX

SequentialTrainerCfg

Configuration for the sequential trainer.

SequentialTrainer

Sequential trainer.

class skrl.trainers.jax.sequential.SequentialTrainerCfg(*, timesteps: int = 100000, headless: bool = False, render_interval: int = 1, disable_progressbar: bool | None = False, close_environment_at_exit: bool = True, environment_info: str = 'episode', stochastic_evaluation: bool = False)[source]

Bases: TrainerCfg

Configuration for the sequential trainer.

Methods:

expand()

Expand the configuration.

validate()

Validate the configuration.

Attributes:

close_environment_at_exit

Whether to close the environment on normal program termination.

disable_progressbar

Whether to disable the progressbar.

environment_info

Key used to get and log environment info.

headless

Whether to run in headless mode (do not call env.render()).

render_interval

Interval (in timesteps) for rendering the environments.

stochastic_evaluation

Whether to use actions rather than (deterministic) mean actions during evaluation.

timesteps

Number of timesteps to train/evaluate for.

expand() None[source]

Expand the configuration.

validate() bool[source]

Validate the configuration.

close_environment_at_exit: bool = True

Whether to close the environment on normal program termination.

disable_progressbar: bool | None = False

Whether to disable the progressbar. If None, disable on non-TTY.

environment_info: str = 'episode'

Key used to get and log environment info.

headless: bool = False

Whether to run in headless mode (do not call env.render()).

render_interval: int = 1

Interval (in timesteps) for rendering the environments. Only effective if headless is False.

stochastic_evaluation: bool = False

Whether to use actions rather than (deterministic) mean actions during evaluation.

timesteps: int = 100000

Number of timesteps to train/evaluate for.

class skrl.trainers.jax.sequential.SequentialTrainer(*, env: Wrapper | MultiAgentEnvWrapper, agents: Agent | MultiAgent | list[Agent] | list[MultiAgent], scopes: list[int] | None = None, cfg: SequentialTrainerCfg | dict = {})[source]

Bases: Trainer

Sequential trainer.

Train agents sequentially, i.e., one after the other, in each interaction with the environment.

Parameters:
  • env – Environment to train/evaluate on.

  • agents – Agent(s) to train/evaluate.

  • scopes – Number of environments for each simultaneous agent to train/evaluate on.

  • cfg – Configuration dictionary.

Methods:

eval()

Evaluate agents sequentially.

train()

Train agents sequentially.

eval() None[source]

Evaluate agents sequentially.

This method executes the following steps in loop:

  • Pre-interaction

  • Compute actions (sequentially)

  • Interact with the environments

  • Render environments

  • Record transitions

  • Reset environments

train() None[source]

Train agents sequentially.

This method executes the following steps in loop:

  • Pre-interaction (sequentially)

  • Compute actions (sequentially)

  • Interact with the environments

  • Render environments

  • Record transitions (sequentially)

  • Post-interaction (sequentially)

  • Reset environments


Warp

SequentialTrainerCfg

Configuration for the sequential trainer.

SequentialTrainer

Sequential trainer.