Multi-agents¶

Multi-agents are autonomous entities that interact with the environment to learn and improve their behavior. Multi-agents’ goal is to learn optimal policies, which are correspondence between states and actions that maximize the cumulative reward received from the environment over time.

Multi-agents
Independent Proximal Policy Optimization (IPPO)	\(\blacksquare\)	\(\blacksquare\)
Multi-Agent Proximal Policy Optimization (MAPPO)	\(\blacksquare\)	\(\blacksquare\)

Base class¶

Note

This is the base class for all multi-agents and provides only basic functionality that is not tied to any implementation of the optimization algorithms. It is not intended to be used directly.

Basic inheritance usage¶

from typing import Union, Dict, Any, Optional, Sequence, Mapping

import gym, gymnasium
import copy

import torch

from skrl.memories.torch import Memory
from skrl.models.torch import Model

from skrl.multi_agents.torch import MultiAgent


CUSTOM_DEFAULT_CONFIG = {
    # ...

    "experiment": {
        "directory": "",            # experiment's parent directory
        "experiment_name": "",      # experiment name
        "write_interval": 250,      # TensorBoard writing interval (timesteps)

        "checkpoint_interval": 1000,        # interval for checkpoints (timesteps)
        "store_separately": False,          # whether to store checkpoints separately

        "wandb": False,             # whether to use Weights & Biases
        "wandb_kwargs": {}          # wandb kwargs (see https://docs.wandb.ai/ref/python/init)
    }
}


class CUSTOM(MultiAgent):
    def __init__(self,
                 possible_agents: Sequence[str],
                 models: Dict[str, Model],
                 memories: Optional[Mapping[str, Memory]] = None,
                 observation_spaces: Optional[Union[Mapping[str, int], Mapping[str, gym.Space], Mapping[str, gymnasium.Space]]] = None,
                 action_spaces: Optional[Union[Mapping[str, int], Mapping[str, gym.Space], Mapping[str, gymnasium.Space]]] = None,
                 device: Optional[Union[str, torch.device]] = None,
                 cfg: Optional[dict] = None) -> None:
        """Custom multi-agent

        :param possible_agents: Name of all possible agents the environment could generate
        :type possible_agents: list of str
        :param models: Models used by the agents.
                       External keys are environment agents' names. Internal keys are the models required by the algorithm
        :type models: nested dictionary of skrl.models.torch.Model
        :param memories: Memories to storage the transitions.
        :type memories: dictionary of skrl.memory.torch.Memory, optional
        :param observation_spaces: Observation/state spaces or shapes (default: ``None``)
        :type observation_spaces: dictionary of int, sequence of int, gym.Space or gymnasium.Space, optional
        :param action_spaces: Action spaces or shapes (default: ``None``)
        :type action_spaces: dictionary of int, sequence of int, gym.Space or gymnasium.Space, optional
        :param device: Device on which a torch tensor is or will be allocated (default: ``None``).
                       If None, the device will be either ``"cuda:0"`` if available or ``"cpu"``
        :type device: str or torch.device, optional
        :param cfg: Configuration dictionary
        :type cfg: dict
        """
        _cfg = copy.deepcopy(CUSTOM_DEFAULT_CONFIG)
        _cfg.update(cfg if cfg is not None else {})
        super().__init__(possible_agents=possible_agents,
                         models=models,
                         memories=memories,
                         observation_spaces=observation_spaces,
                         action_spaces=action_spaces,
                         device=device,
                         cfg=_cfg)
        # =======================================================================
        # - get and process models from `self.models`
        # - populate `self.checkpoint_modules` dictionary for storing checkpoints
        # - parse configurations from `self.cfg`
        # - setup optimizers and learning rate scheduler
        # - set up preprocessors
        # =======================================================================

    def init(self, trainer_cfg: Optional[Dict[str, Any]] = None) -> None:
        """Initialize the agent
        """
        super().init(trainer_cfg=trainer_cfg)
        self.set_mode("eval")
        # =================================================================
        # - create tensors in memory if required
        # - # create temporary variables needed for storage and computation
        # =================================================================

    def act(self, states: Mapping[str, torch.Tensor], timestep: int, timesteps: int) -> torch.Tensor:
        """Process the environment's states to make a decision (actions) using the main policies

        :param states: Environment's states
        :type states: dictionary of torch.Tensor
        :param timestep: Current timestep
        :type timestep: int
        :param timesteps: Number of timesteps
        :type timesteps: int

        :return: Actions
        :rtype: torch.Tensor
        """
        # ======================================
        # - sample random actions if required or
        #   sample and return agent's actions
        # ======================================

    def record_transition(self,
                          states: Mapping[str, torch.Tensor],
                          actions: Mapping[str, torch.Tensor],
                          rewards: Mapping[str, torch.Tensor],
                          next_states: Mapping[str, torch.Tensor],
                          terminated: Mapping[str, torch.Tensor],
                          truncated: Mapping[str, torch.Tensor],
                          infos: Mapping[str, Any],
                          timestep: int,
                          timesteps: int) -> None:
        """Record an environment transition in memory

        :param states: Observations/states of the environment used to make the decision
        :type states: dictionary of torch.Tensor
        :param actions: Actions taken by the agent
        :type actions: dictionary of torch.Tensor
        :param rewards: Instant rewards achieved by the current actions
        :type rewards: dictionary of torch.Tensor
        :param next_states: Next observations/states of the environment
        :type next_states: dictionary of torch.Tensor
        :param terminated: Signals to indicate that episodes have terminated
        :type terminated: dictionary of torch.Tensor
        :param truncated: Signals to indicate that episodes have been truncated
        :type truncated: dictionary of torch.Tensor
        :param infos: Additional information about the environment
        :type infos: dictionary of any supported type
        :param timestep: Current timestep
        :type timestep: int
        :param timesteps: Number of timesteps
        :type timesteps: int
        """
        super().record_transition(states, actions, rewards, next_states, terminated, truncated, infos, timestep, timesteps)
        # ========================================
        # - record agent's specific data in memory
        # ========================================

    def pre_interaction(self, timestep: int, timesteps: int) -> None:
        """Callback called before the interaction with the environment

        :param timestep: Current timestep
        :type timestep: int
        :param timesteps: Number of timesteps
        :type timesteps: int
        """
        # =====================================
        # - call `self.update(...)` if required
        # =====================================

    def post_interaction(self, timestep: int, timesteps: int) -> None:
        """Callback called after the interaction with the environment

        :param timestep: Current timestep
        :type timestep: int
        :param timesteps: Number of timesteps
        :type timesteps: int
        """
        # =====================================
        # - call `self.update(...)` if required
        # =====================================
        # call parent's method for checkpointing and TensorBoard writing
        super().post_interaction(timestep, timesteps)

    def _update(self, timestep: int, timesteps: int) -> None:
        """Algorithm's main update step

        :param timestep: Current timestep
        :type timestep: int
        :param timesteps: Number of timesteps
        :type timesteps: int
        """
        # ===================================================
        # - implement algorithm's update step
        # - record tracking data using `self.track_data(...)`
        # ===================================================

from typing import Union, Dict, Any, Optional, Sequence, Mapping

import gym, gymnasium
import copy

import jaxlib
import jax.numpy as jnp

from skrl.memories.jax import Memory
from skrl.models.jax import Model
from skrl.resources.optimizers.jax import Adam

from skrl.multi_agents.jax import MultiAgent


CUSTOM_DEFAULT_CONFIG = {
    # ...

    "experiment": {
        "directory": "",            # experiment's parent directory
        "experiment_name": "",      # experiment name
        "write_interval": 250,      # TensorBoard writing interval (timesteps)

        "checkpoint_interval": 1000,        # interval for checkpoints (timesteps)
        "store_separately": False,          # whether to store checkpoints separately

        "wandb": False,             # whether to use Weights & Biases
        "wandb_kwargs": {}          # wandb kwargs (see https://docs.wandb.ai/ref/python/init)
    }
}


class CUSTOM(MultiAgent):
    def __init__(self,
                 possible_agents: Sequence[str],
                 models: Dict[str, Model],
                 memories: Optional[Mapping[str, Memory]] = None,
                 observation_spaces: Optional[Union[Mapping[str, int], Mapping[str, gym.Space], Mapping[str, gymnasium.Space]]] = None,
                 action_spaces: Optional[Union[Mapping[str, int], Mapping[str, gym.Space], Mapping[str, gymnasium.Space]]] = None,
                 device: Optional[Union[str, jaxlib.xla_extension.Device]] = None,
                 cfg: Optional[dict] = None) -> None:
        """Custom multi-agent

        :param possible_agents: Name of all possible agents the environment could generate
        :type possible_agents: list of str
        :param models: Models used by the agents.
                       External keys are environment agents' names. Internal keys are the models required by the algorithm
        :type models: nested dictionary of skrl.models.torch.Model
        :param memories: Memories to storage the transitions.
        :type memories: dictionary of skrl.memory.torch.Memory, optional
        :param observation_spaces: Observation/state spaces or shapes (default: ``None``)
        :type observation_spaces: dictionary of int, sequence of int, gym.Space or gymnasium.Space, optional
        :param action_spaces: Action spaces or shapes (default: ``None``)
        :type action_spaces: dictionary of int, sequence of int, gym.Space or gymnasium.Space, optional
        :param device: Device on which a jax array is or will be allocated (default: ``None``).
                       If None, the device will be either ``"cuda:0"`` if available or ``"cpu"``
        :type device: str or jaxlib.xla_extension.Device, optional
        :param cfg: Configuration dictionary
        :type cfg: dict
        """
        _cfg = copy.deepcopy(CUSTOM_DEFAULT_CONFIG)
        _cfg.update(cfg if cfg is not None else {})
        super().__init__(possible_agents=possible_agents,
                         models=models,
                         memories=memories,
                         observation_spaces=observation_spaces,
                         action_spaces=action_spaces,
                         device=device,
                         cfg=_cfg)
        # =======================================================================
        # - get and process models from `self.models`
        # - populate `self.checkpoint_modules` dictionary for storing checkpoints
        # - parse configurations from `self.cfg`
        # - setup optimizers and learning rate scheduler
        # - set up preprocessors
        # =======================================================================

    def init(self, trainer_cfg: Optional[Dict[str, Any]] = None) -> None:
        """Initialize the agent
        """
        super().init(trainer_cfg=trainer_cfg)
        self.set_mode("eval")
        # =================================================================
        # - create tensors in memory if required
        # - # create temporary variables needed for storage and computation
        # =================================================================

    def act(self, states: Mapping[str, jnp.ndarray], timestep: int, timesteps: int) -> jnp.ndarray:
        """Process the environment's states to make a decision (actions) using the main policies

        :param states: Environment's states
        :type states: dictionary of jnp.ndarray
        :param timestep: Current timestep
        :type timestep: int
        :param timesteps: Number of timesteps
        :type timesteps: int

        :return: Actions
        :rtype: jnp.ndarray
        """
        # ======================================
        # - sample random actions if required or
        #   sample and return agent's actions
        # ======================================

    def record_transition(self,
                          states: Mapping[str, jnp.ndarray],
                          actions: Mapping[str, jnp.ndarray],
                          rewards: Mapping[str, jnp.ndarray],
                          next_states: Mapping[str, jnp.ndarray],
                          terminated: Mapping[str, jnp.ndarray],
                          truncated: Mapping[str, jnp.ndarray],
                          infos: Mapping[str, Any],
                          timestep: int,
                          timesteps: int) -> None:
        """Record an environment transition in memory

        :param states: Observations/states of the environment used to make the decision
        :type states: dictionary of jnp.ndarray
        :param actions: Actions taken by the agent
        :type actions: dictionary of jnp.ndarray
        :param rewards: Instant rewards achieved by the current actions
        :type rewards: dictionary of jnp.ndarray
        :param next_states: Next observations/states of the environment
        :type next_states: dictionary of jnp.ndarray
        :param terminated: Signals to indicate that episodes have terminated
        :type terminated: dictionary of jnp.ndarray
        :param truncated: Signals to indicate that episodes have been truncated
        :type truncated: dictionary of jnp.ndarray
        :param infos: Additional information about the environment
        :type infos: dictionary of any type supported by the environment
        :param timestep: Current timestep
        :type timestep: int
        :param timesteps: Number of timesteps
        :type timesteps: int
        """
        super().record_transition(states, actions, rewards, next_states, terminated, truncated, infos, timestep, timesteps)
        # ========================================
        # - record agent's specific data in memory
        # ========================================

    def pre_interaction(self, timestep: int, timesteps: int) -> None:
        """Callback called before the interaction with the environment

        :param timestep: Current timestep
        :type timestep: int
        :param timesteps: Number of timesteps
        :type timesteps: int
        """
        # =====================================
        # - call `self.update(...)` if required
        # =====================================

    def post_interaction(self, timestep: int, timesteps: int) -> None:
        """Callback called after the interaction with the environment

        :param timestep: Current timestep
        :type timestep: int
        :param timesteps: Number of timesteps
        :type timesteps: int
        """
        # =====================================
        # - call `self.update(...)` if required
        # =====================================
        # call parent's method for checkpointing and TensorBoard writing
        super().post_interaction(timestep, timesteps)

    def _update(self, timestep: int, timesteps: int) -> None:
        """Algorithm's main update step

        :param timestep: Current timestep
        :type timestep: int
        :param timesteps: Number of timesteps
        :type timesteps: int
        """
        # ===================================================
        # - implement algorithm's update step
        # - record tracking data using `self.track_data(...)`
        # ===================================================

API (PyTorch)¶

Bases: object

Base class that represent a RL multi-agent

Parameters:

possible_agents (list of str) – Name of all possible agents the environment could generate
models (nested dictionary of skrl.models.torch.Model) – Models used by the agents. External keys are environment agents’ names. Internal keys are the models required by the algorithm
memories (dictionary of skrl.memory.torch.Memory, optional) – Memories to storage the transitions.
observation_spaces (dictionary of int, sequence of int, gym.Space or gymnasium.Space, optional) – Observation/state spaces or shapes (default: None)
action_spaces (dictionary of int, sequence of int, gym.Space or gymnasium.Space, optional) – Action spaces or shapes (default: None)
device (str or torch.device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"
cfg (dict) – Configuration dictionary

__str__() → str¶

Generate a representation of the agent as string

Returns:: Representation of the agent as string
Return type:: str

_as_dict(_input: Any) → Mapping[str, Any]¶

Convert a configuration value into a dictionary according to the number of agents

Parameters:: _input (Any) – Configuration value
Raises:: ValueError – The configuration value is a dictionary different from the number of agents
Returns:: Configuration value as a dictionary
Return type:: list of any configuration value

_empty_preprocessor(_input: Any, *args, **kwargs) → Any¶

Empty preprocess method

This method is defined because PyTorch multiprocessing can’t pickle lambdas

Parameters:: _input (Any) – Input to preprocess
Returns:: Preprocessed input
Return type:: Any

_get_internal_value(_module: Any) → Any¶

Get internal module/variable state/value

Parameters:: _module (Any) – Module or variable
Returns:: Module/variable state/value
Return type:: Any

_update(timestep: int, timesteps: int) → None¶

Algorithm’s main update step

Parameters:

timestep (int) – Current timestep
timesteps (int) – Number of timesteps

Raises:

NotImplementedError – The method is not implemented by the inheriting classes

act(states: Mapping[str, torch.Tensor], timestep: int, timesteps: int) → torch.Tensor¶

Process the environment’s states to make a decision (actions) using the main policy

Parameters:

states (dictionary of torch.Tensor) – Environment’s states
timestep (int) – Current timestep
timesteps (int) – Number of timesteps

Raises:

NotImplementedError – The method is not implemented by the inheriting classes

Returns:

Actions

Return type:

torch.Tensor

init(trainer_cfg: Mapping[str, Any] | None = None) → None¶

Initialize the agent

This method should be called before the agent is used. It will initialize the TensoBoard writer (and optionally Weights & Biases) and create the checkpoints directory

Parameters:: trainer_cfg (dict, optional) – Trainer configuration

load(path: str) → None¶

Load the model from the specified path

The final storage device is determined by the constructor of the model

Parameters:: path (str) – Path to load the model from

migrate(path: str, name_map: Mapping[str, Mapping[str, str]] = {}, auto_mapping: bool = True, verbose: bool = False) → bool¶

Migrate the specified extrernal checkpoint to the current agent

The final storage device is determined by the constructor of the agent.

For ambiguous models (where 2 or more parameters, for source or current model, have equal shape) it is necessary to define the name_map, at least for those parameters, to perform the migration successfully

Parameters:

path (str) – Path to the external checkpoint to migrate from
name_map (Mapping[str, Mapping[str, str]], optional) – Name map to use for the migration (default: {}). Keys are the current parameter names and values are the external parameter names
auto_mapping (bool, optional) – Automatically map the external state dict to the current state dict (default: True)
verbose (bool, optional) – Show model names and migration (default: False)

Raises:

ValueError – If the correct file type cannot be identified from the path parameter

Returns:

True if the migration was successful, False otherwise. Migration is successful if all parameters of the current model are found in the external model

Return type:

bool

post_interaction(timestep: int, timesteps: int) → None¶

Callback called after the interaction with the environment

Parameters:

timestep (int) – Current timestep
timesteps (int) – Number of timesteps

pre_interaction(timestep: int, timesteps: int) → None¶

Callback called before the interaction with the environment

Parameters:

timestep (int) – Current timestep
timesteps (int) – Number of timesteps

record_transition(states: Mapping[str, torch.Tensor], actions: Mapping[str, torch.Tensor], rewards: Mapping[str, torch.Tensor], next_states: Mapping[str, torch.Tensor], terminated: Mapping[str, torch.Tensor], truncated: Mapping[str, torch.Tensor], infos: Mapping[str, Any], timestep: int, timesteps: int) → None¶

Record an environment transition in memory (to be implemented by the inheriting classes)

Inheriting classes must call this method to record episode information (rewards, timesteps, etc.). In addition to recording environment transition (such as states, rewards, etc.), agent information can be recorded.

Parameters:

states (dictionary of torch.Tensor) – Observations/states of the environment used to make the decision
actions (dictionary of torch.Tensor) – Actions taken by the agent
rewards (dictionary of torch.Tensor) – Instant rewards achieved by the current actions
next_states (dictionary of torch.Tensor) – Next observations/states of the environment
terminated (dictionary of torch.Tensor) – Signals to indicate that episodes have terminated
truncated (dictionary of torch.Tensor) – Signals to indicate that episodes have been truncated
infos (dictionary of any supported type) – Additional information about the environment
timestep (int) – Current timestep
timesteps (int) – Number of timesteps

save(path: str) → None¶

Save the agent to the specified path

Parameters:: path (str) – Path to save the model to

set_mode(mode: str) → None¶

Set the model mode (training or evaluation)

Parameters:: mode (str) – Mode: ‘train’ for training or ‘eval’ for evaluation

set_running_mode(mode: str) → None¶

Set the current running mode (training or evaluation)

This method sets the value of the training property (boolean). This property can be used to know if the agent is running in training or evaluation mode.

Parameters:: mode (str) – Mode: ‘train’ for training or ‘eval’ for evaluation

track_data(tag: str, value: float) → None¶

Track data to TensorBoard

Currently only scalar data are supported

Parameters:

tag (str) – Data identifier (e.g. ‘Loss / policy loss’)
value (float) – Value to track

write_checkpoint(timestep: int, timesteps: int) → None¶

Write checkpoint (modules) to disk

The checkpoints are saved in the directory ‘checkpoints’ in the experiment directory. The name of the checkpoint is the current timestep if timestep is not None, otherwise it is the current time.

Parameters:

timestep (int) – Current timestep
timesteps (int) – Number of timesteps

write_tracking_data(timestep: int, timesteps: int) → None¶

Write tracking data to TensorBoard

Parameters:

timestep (int) – Current timestep
timesteps (int) – Number of timesteps

API (JAX)¶

Bases: object

Base class that represent a RL multi-agent

Parameters:

possible_agents (list of str) – Name of all possible agents the environment could generate
models (nested dictionary of skrl.models.jax.Model) – Models used by the agents. External keys are environment agents’ names. Internal keys are the models required by the algorithm
memories (dictionary of skrl.memory.jax.Memory, optional) – Memories to storage the transitions.
observation_spaces (dictionary of int, sequence of int, gym.Space or gymnasium.Space, optional) – Observation/state spaces or shapes (default: None)
action_spaces (dictionary of int, sequence of int, gym.Space or gymnasium.Space, optional) – Action spaces or shapes (default: None)
device (str or jax.Device, optional) – Device on which a tensor/array is or will be allocated (default: None). If None, the device will be either "cuda" if available or "cpu"
cfg (dict) – Configuration dictionary

__str__() → str¶

Generate a representation of the agent as string

Returns:: Representation of the agent as string
Return type:: str

_as_dict(_input: Any) → Mapping[str, Any]¶

Convert a configuration value into a dictionary according to the number of agents

Parameters:: _input (Any) – Configuration value
Raises:: ValueError – The configuration value is a dictionary different from the number of agents
Returns:: Configuration value as a dictionary
Return type:: list of any configuration value

_empty_preprocessor(_input: Any, *args, **kwargs) → Any¶

Empty preprocess method

This method is defined because PyTorch multiprocessing can’t pickle lambdas

Parameters:: _input (Any) – Input to preprocess
Returns:: Preprocessed input
Return type:: Any

_get_internal_value(_module: Any) → Any¶

Get internal module/variable state/value

Parameters:: _module (Any) – Module or variable
Returns:: Module/variable state/value
Return type:: Any

_update(timestep: int, timesteps: int) → None¶

Algorithm’s main update step

Parameters:

timestep (int) – Current timestep
timesteps (int) – Number of timesteps

Raises:

NotImplementedError – The method is not implemented by the inheriting classes

act(states: Mapping[str, ndarray | jax.Array], timestep: int, timesteps: int) → ndarray | jax.Array¶

Process the environment’s states to make a decision (actions) using the main policy

Parameters:

states (dictionary of np.ndarray or jax.Array) – Environment’s states
timestep (int) – Current timestep
timesteps (int) – Number of timesteps

Raises:

NotImplementedError – The method is not implemented by the inheriting classes

Returns:

Actions

Return type:

np.ndarray or jax.Array

init(trainer_cfg: Mapping[str, Any] | None = None) → None¶

Initialize the agent

This method should be called before the agent is used. It will initialize the TensoBoard writer (and optionally Weights & Biases) and create the checkpoints directory

Parameters:: trainer_cfg (dict, optional) – Trainer configuration

load(path: str) → None¶

Load the model from the specified path

Parameters:: path (str) – Path to load the model from

migrate(path: str, name_map: Mapping[str, Mapping[str, str]] = {}, auto_mapping: bool = True, verbose: bool = False) → bool¶

Migrate the specified extrernal checkpoint to the current agent

Raises:: NotImplementedError – Not yet implemented

post_interaction(timestep: int, timesteps: int) → None¶

Callback called after the interaction with the environment

Parameters:

timestep (int) – Current timestep
timesteps (int) – Number of timesteps

pre_interaction(timestep: int, timesteps: int) → None¶

Callback called before the interaction with the environment

Parameters:

timestep (int) – Current timestep
timesteps (int) – Number of timesteps

record_transition(states: Mapping[str, ndarray | jax.Array], actions: Mapping[str, ndarray | jax.Array], rewards: Mapping[str, ndarray | jax.Array], next_states: Mapping[str, ndarray | jax.Array], terminated: Mapping[str, ndarray | jax.Array], truncated: Mapping[str, ndarray | jax.Array], infos: Mapping[str, Any], timestep: int, timesteps: int) → None¶

Record an environment transition in memory (to be implemented by the inheriting classes)

Parameters:

states (dictionary of np.ndarray or jax.Array) – Observations/states of the environment used to make the decision
actions (dictionary of np.ndarray or jax.Array) – Actions taken by the agent
rewards (dictionary of np.ndarray or jax.Array) – Instant rewards achieved by the current actions
next_states (dictionary of np.ndarray or jax.Array) – Next observations/states of the environment
terminated (dictionary of np.ndarray or jax.Array) – Signals to indicate that episodes have terminated
truncated (dictionary of np.ndarray or jax.Array) – Signals to indicate that episodes have been truncated
infos (dictionary of any type supported by the environment) – Additional information about the environment
timestep (int) – Current timestep
timesteps (int) – Number of timesteps

save(path: str) → None¶

Save the agent to the specified path

Parameters:: path (str) – Path to save the model to

set_mode(mode: str) → None¶

Set the model mode (training or evaluation)

Parameters:: mode (str) – Mode: ‘train’ for training or ‘eval’ for evaluation

set_running_mode(mode: str) → None¶

Set the current running mode (training or evaluation)

This method sets the value of the training property (boolean). This property can be used to know if the agent is running in training or evaluation mode.

Parameters:: mode (str) – Mode: ‘train’ for training or ‘eval’ for evaluation

track_data(tag: str, value: float) → None¶

Track data to TensorBoard

Currently only scalar data are supported

Parameters:

tag (str) – Data identifier (e.g. ‘Loss / policy loss’)
value (float) – Value to track

write_checkpoint(timestep: int, timesteps: int) → None¶

Write checkpoint (modules) to disk

The checkpoints are saved in the directory ‘checkpoints’ in the experiment directory. The name of the checkpoint is the current timestep if timestep is not None, otherwise it is the current time.

Parameters:

timestep (int) – Current timestep
timesteps (int) – Number of timesteps

write_tracking_data(timestep: int, timesteps: int) → None¶

Write tracking data to TensorBoard

Parameters:

timestep (int) – Current timestep
timesteps (int) – Number of timesteps