Wrapping (multi-agents)¶
This library works with a common API to interact with the following RL multi-agent environments:
Farama PettingZoo (parallel API) and Shimmy
NVIDIA Isaac Lab
To operate with them, out-of-the-box, and to support interoperability between these non-compatible interfaces, a wrapping mechanism is provided as shown in the following image.
Usage¶
The following snippets show how to wrap multi-agent environments from the different supported libraries:
# import the environment wrapper and loader
from skrl.envs.wrappers.torch import wrap_env
from skrl.envs.loaders.torch import load_isaaclab_env
# load the environment
env = load_isaaclab_env(task_name="Isaac-Cart-Double-Pendulum-Direct-v0")
# wrap the environment
env = wrap_env(env) # or 'env = wrap_env(env, wrapper="isaaclab-multi-agent")'
# import the environment wrapper and loader
from skrl.envs.wrappers.jax import wrap_env
from skrl.envs.loaders.jax import load_isaaclab_env
# load the environment
env = load_isaaclab_env(task_name="Isaac-Cart-Double-Pendulum-Direct-v0")
# wrap the environment
env = wrap_env(env) # or 'env = wrap_env(env, wrapper="isaaclab-multi-agent")'
# import the environment wrapper
from skrl.envs.wrappers.torch import wrap_env
# import a PettingZoo environment
from pettingzoo.sisl import multiwalker_v9
# load the environment
env = multiwalker_v9.parallel_env()
# wrap the environment
env = wrap_env(env) # or 'env = wrap_env(env, wrapper="pettingzoo")'
# import the environment wrapper
from skrl.envs.wrappers.jax import wrap_env
# import a PettingZoo environment
from pettingzoo.sisl import multiwalker_v9
# load the environment
env = multiwalker_v9.parallel_env()
# wrap the environment
env = wrap_env(env) # or 'env = wrap_env(env, wrapper="pettingzoo")'
# import the environment wrapper
from skrl.envs.wrappers.torch import wrap_env
# import the shimmy module
from shimmy import MeltingPotCompatibilityV0
# load the environment (API conversion)
env = MeltingPotCompatibilityV0(substrate_name="prisoners_dilemma_in_the_matrix__arena")
# wrap the environment
env = wrap_env(env) # or 'env = wrap_env(env, wrapper="pettingzoo")'
# import the environment wrapper
from skrl.envs.wrappers.jax import wrap_env
# import the shimmy module
from shimmy import MeltingPotCompatibilityV0
# load the environment (API conversion)
env = MeltingPotCompatibilityV0(substrate_name="prisoners_dilemma_in_the_matrix__arena")
# wrap the environment
env = wrap_env(env) # or 'env = wrap_env(env, wrapper="pettingzoo")'
API¶
PyTorch¶
- skrl.envs.wrappers.torch.wrap_env(env: Any, wrapper: Literal['auto', 'gym', 'gymnasium', 'isaaclab', 'isaaclab-single-agent', 'isaaclab-multi-agent', 'mani-skill', 'pettingzoo', 'playground'] = 'auto', verbose: bool = True) Wrapper | MultiAgentEnvWrapper[source]¶
Wrap an environment to use a common interface.
Example:
>>> from skrl.envs.wrappers.torch import wrap_env >>> >>> # assuming that there is an environment called "env" >>> env = wrap_env(env)
- Parameters:
env – The environment instance to be wrapped.
wrapper –
The type of wrapper to use. If
"auto", the wrapper will be automatically selected based on the environment class. The supported wrappers are described in the following table:Single-agent environments
¶Environment
Wrapper tag
OpenAI Gym
"gym"Gymnasium
"gymnasium"Isaac Lab
"isaaclab"("isaaclab-single-agent")ManiSkill
"mani-skill"MuJoCo Playground
"playground"Multi-agent environments
¶Environment
Wrapper tag
PettingZoo
"pettingzoo"Isaac Lab
"isaaclab"("isaaclab-multi-agent")verbose – Whether to print verbose information about the environment and the wrapper.
- Returns:
Wrapped environment instance.
- Raises:
ValueError – Unknown wrapper type.
JAX¶
- skrl.envs.wrappers.jax.wrap_env(env: Any, wrapper: Literal['auto', 'gym', 'gymnasium', 'isaaclab', 'isaaclab-single-agent', 'isaaclab-multi-agent', 'mani-skill', 'pettingzoo', 'playground'] = 'auto', verbose: bool = True) Wrapper | MultiAgentEnvWrapper[source]¶
Wrap an environment to use a common interface.
Example:
>>> from skrl.envs.wrappers.jax import wrap_env >>> >>> # assuming that there is an environment called "env" >>> env = wrap_env(env)
- Parameters:
env – The environment instance to be wrapped.
wrapper –
The type of wrapper to use. If
"auto", the wrapper will be automatically selected based on the environment class. The supported wrappers are described in the following table:Single-agent environments
¶Environment
Wrapper tag
OpenAI Gym
"gym"Gymnasium
"gymnasium"Isaac Lab
"isaaclab"("isaaclab-single-agent")ManiSkill
"mani-skill"MuJoCo Playground
"playground"Multi-agent environments
¶Environment
Wrapper tag
PettingZoo
"pettingzoo"Isaac Lab
"isaaclab"("isaaclab-multi-agent")verbose – Whether to print verbose information about the environment and the wrapper.
- Returns:
Wrapped environment instance.
- Raises:
ValueError – Unknown wrapper type.
Internal API¶
PyTorch¶
Base wrapper class for multi-agent environments. |
|
Isaac Lab environment wrapper for multi-agent implementation. |
|
PettingZoo (Parallel API) environment wrapper. |
- class skrl.envs.wrappers.torch.MultiAgentEnvWrapper(env: Any)[source]¶
Bases:
ABCBase wrapper class for multi-agent environments.
- Parameters:
env – The multi-agent environment instance to wrap.
Methods:
action_space(agent)Action space.
close()Close the environment.
observation_space(agent)Observation space.
render(*args, **kwargs)Render the environment.
reset()Reset the environment.
state()Get the environment state.
state_space(agent)State space.
step(actions)Perform a step in the environment.
Attributes:
Action spaces.
Names of all current agents.
The device used by the environment.
Number of possible agents the environment could generate.
Number of current agents.
Number of environments.
Observation spaces.
Names of all possible agents the environment could generate.
State spaces.
- action_space(agent: str) gymnasium.Space[source]¶
Action space.
- Parameters:
agent – Name of the agent.
- Returns:
The action space for the specified agent.
- observation_space(agent: str) gymnasium.Space[source]¶
Observation space.
- Parameters:
agent – Name of the agent.
- Returns:
The observation space for the specified agent.
- abstractmethod render(*args, **kwargs) Any[source]¶
Render the environment.
- Returns:
Any value from the wrapped environment.
- abstractmethod reset() tuple[dict[str, torch.Tensor], dict[str, Any]][source]¶
Reset the environment.
- Returns:
Observation, info.
- abstractmethod state() dict[str, torch.Tensor | None][source]¶
Get the environment state.
- Returns:
State.
- state_space(agent: str) gymnasium.Space | None[source]¶
State space.
See
state_spacesfor more details.- Parameters:
agent – Name of the agent.
- Returns:
The state space for the specified agent.
- abstractmethod step(actions: dict[str, torch.Tensor]) tuple[dict[str, torch.Tensor], dict[str, torch.Tensor], dict[str, torch.Tensor], dict[str, torch.Tensor], dict[str, Any]][source]¶
Perform a step in the environment.
- Parameters:
actions – The actions to perform.
- Returns:
Observation, reward, terminated, truncated, info.
- property agents: list[str][source]¶
Names of all current agents.
These may be changed as an environment progresses (i.e. agents can be added or removed).
- property device: torch.device[source]¶
The device used by the environment.
If the wrapped environment does not have the
deviceproperty, the value of this property will be"cuda"or"cpu"depending on the device availability.
- property max_num_agents: int[source]¶
Number of possible agents the environment could generate.
Read from the length of the
possible_agentsproperty if the wrapped environment doesn’t define it.
- property num_agents: int[source]¶
Number of current agents.
Read from the length of the
agentsproperty if the wrapped environment doesn’t define it.
- property num_envs: int[source]¶
Number of environments.
If the wrapped environment does not have the
num_envsproperty, it will be set to 1.
- property possible_agents: list[str][source]¶
Names of all possible agents the environment could generate.
These can not be changed as an environment progresses.
- property state_spaces: dict[str, gymnasium.Space | None][source]¶
State spaces.
Although this property returns a dictionary, the space for each agent adheres to the next rules:
The wrapped environment has the
state_spaceattribute (homogeneous state). The state is a global view of the environment, so the space is the same for all agents.The wrapped environment has the
state_spacesattribute (heterogeneous state). The state may differ for each agent, so the agent spaces may also differ.The wrapped environment does not have the previous attributes. The space is
Nonefor all agents.
- class skrl.envs.wrappers.torch.isaaclab_envs.IsaacLabMultiAgentWrapper(env: Any)[source]¶
Bases:
MultiAgentEnvWrapperIsaac Lab environment wrapper for multi-agent implementation.
- Parameters:
env – The environment instance to wrap.
Methods:
action_space(agent)Action space.
close()Close the environment.
observation_space(agent)Observation space.
render(*args, **kwargs)Render the environment.
reset()Reset the environment.
state()Get the environment state.
state_space(agent)State space.
step(actions)Perform a step in the environment.
Attributes:
Action spaces.
Names of all current agents.
The device used by the environment.
Number of possible agents the environment could generate.
Number of current agents.
Number of environments.
Observation spaces.
Names of all possible agents the environment could generate.
State spaces.
- action_space(agent: str) gymnasium.Space[source]¶
Action space.
- Parameters:
agent – Name of the agent.
- Returns:
The action space for the specified agent.
- observation_space(agent: str) gymnasium.Space[source]¶
Observation space.
- Parameters:
agent – Name of the agent.
- Returns:
The observation space for the specified agent.
- reset() tuple[dict[str, torch.Tensor], dict[str, Any]][source]¶
Reset the environment.
- Returns:
Observation, info.
- state_space(agent: str) gymnasium.Space | None[source]¶
State space.
See
state_spacesfor more details.- Parameters:
agent – Name of the agent.
- Returns:
The state space for the specified agent.
- step(actions: dict[str, torch.Tensor]) tuple[dict[str, torch.Tensor], dict[str, torch.Tensor], dict[str, torch.Tensor], dict[str, torch.Tensor], dict[str, Any]][source]¶
Perform a step in the environment.
- Parameters:
actions – The actions to perform.
- Returns:
Observation, reward, terminated, truncated, info.
- property agents: list[str][source]¶
Names of all current agents.
These may be changed as an environment progresses (i.e. agents can be added or removed).
- property device: torch.device[source]¶
The device used by the environment.
If the wrapped environment does not have the
deviceproperty, the value of this property will be"cuda"or"cpu"depending on the device availability.
- property max_num_agents: int[source]¶
Number of possible agents the environment could generate.
Read from the length of the
possible_agentsproperty if the wrapped environment doesn’t define it.
- property num_agents: int[source]¶
Number of current agents.
Read from the length of the
agentsproperty if the wrapped environment doesn’t define it.
- property num_envs: int[source]¶
Number of environments.
If the wrapped environment does not have the
num_envsproperty, it will be set to 1.
- property possible_agents: list[str][source]¶
Names of all possible agents the environment could generate.
These can not be changed as an environment progresses.
- property state_spaces: dict[str, gymnasium.Space | None][source]¶
State spaces.
Although this property returns a dictionary, the space for each agent adheres to the next rules:
The wrapped environment has the
state_spaceattribute (homogeneous state). The state is a global view of the environment, so the space is the same for all agents.The wrapped environment has the
state_spacesattribute (heterogeneous state). The state may differ for each agent, so the agent spaces may also differ.The wrapped environment does not have the previous attributes. The space is
Nonefor all agents.
- class skrl.envs.wrappers.torch.pettingzoo_envs.PettingZooWrapper(env: Any)[source]¶
Bases:
MultiAgentEnvWrapperPettingZoo (Parallel API) environment wrapper.
- Parameters:
env – The environment instance to wrap.
Methods:
action_space(agent)Action space.
close()Close the environment.
observation_space(agent)Observation space.
render(*args, **kwargs)Render the environment.
reset()Reset the environment.
state()Get the environment state.
state_space(agent)State space.
step(actions)Perform a step in the environment.
Attributes:
Action spaces.
Names of all current agents.
The device used by the environment.
Number of possible agents the environment could generate.
Number of current agents.
Number of environments.
Observation spaces.
Names of all possible agents the environment could generate.
State spaces.
- action_space(agent: str) gymnasium.Space[source]¶
Action space.
- Parameters:
agent – Name of the agent.
- Returns:
The action space for the specified agent.
- observation_space(agent: str) gymnasium.Space[source]¶
Observation space.
- Parameters:
agent – Name of the agent.
- Returns:
The observation space for the specified agent.
- reset() tuple[dict[str, torch.Tensor], dict[str, Any]][source]¶
Reset the environment.
- Returns:
Observation, info.
- state() dict[str, torch.Tensor | None][source]¶
Get the environment state.
In PettingZoo, the state is a global view of the environment, so it is the same for all agents.
- Returns:
State.
- state_space(agent: str) gymnasium.Space | None[source]¶
State space.
See
state_spacesfor more details.- Parameters:
agent – Name of the agent.
- Returns:
The state space for the specified agent.
- step(actions: dict[str, torch.Tensor]) tuple[dict[str, torch.Tensor], dict[str, torch.Tensor], dict[str, torch.Tensor], dict[str, torch.Tensor], dict[str, Any]][source]¶
Perform a step in the environment.
- Parameters:
actions – The actions to perform.
- Returns:
Observation, reward, terminated, truncated, info.
- property agents: list[str][source]¶
Names of all current agents.
These may be changed as an environment progresses (i.e. agents can be added or removed).
- property device: torch.device[source]¶
The device used by the environment.
If the wrapped environment does not have the
deviceproperty, the value of this property will be"cuda"or"cpu"depending on the device availability.
- property max_num_agents: int[source]¶
Number of possible agents the environment could generate.
Read from the length of the
possible_agentsproperty if the wrapped environment doesn’t define it.
- property num_agents: int[source]¶
Number of current agents.
Read from the length of the
agentsproperty if the wrapped environment doesn’t define it.
- property num_envs: int[source]¶
Number of environments.
If the wrapped environment does not have the
num_envsproperty, it will be set to 1.
- property possible_agents: list[str][source]¶
Names of all possible agents the environment could generate.
These can not be changed as an environment progresses.
- property state_spaces: dict[str, gymnasium.Space | None][source]¶
State spaces.
Although this property returns a dictionary, the space for each agent adheres to the next rules:
The wrapped environment has the
state_spaceattribute (homogeneous state). The state is a global view of the environment, so the space is the same for all agents.The wrapped environment has the
state_spacesattribute (heterogeneous state). The state may differ for each agent, so the agent spaces may also differ.The wrapped environment does not have the previous attributes. The space is
Nonefor all agents.
JAX¶
Base wrapper class for multi-agent environments. |
|
Isaac Lab environment wrapper for multi-agent implementation. |
|
PettingZoo (Parallel API) environment wrapper. |
- class skrl.envs.wrappers.jax.MultiAgentEnvWrapper(env: Any)[source]¶
Bases:
ABCBase wrapper class for multi-agent environments.
- Parameters:
env – The multi-agent environment instance to wrap.
Methods:
action_space(agent)Action space.
close()Close the environment.
observation_space(agent)Observation space.
render(*args, **kwargs)Render the environment.
reset()Reset the environment.
state()Get the environment state.
state_space(agent)State space.
step(actions)Perform a step in the environment.
Attributes:
Action spaces.
Names of all current agents.
The device used by the environment.
Number of possible agents the environment could generate.
Number of current agents.
Number of environments.
Observation spaces.
Names of all possible agents the environment could generate.
State spaces.
- action_space(agent: str) gymnasium.Space[source]¶
Action space.
- Parameters:
agent – Name of the agent.
- Returns:
The action space for the specified agent.
- observation_space(agent: str) gymnasium.Space[source]¶
Observation space.
- Parameters:
agent – Name of the agent.
- Returns:
The observation space for the specified agent.
- abstractmethod render(*args, **kwargs) Any[source]¶
Render the environment.
- Returns:
Any value from the wrapped environment.
- abstractmethod reset() tuple[dict[str, jax.Array], dict[str, Any]][source]¶
Reset the environment.
- Returns:
Observation, info.
- state_space(agent: str) gymnasium.Space | None[source]¶
State space.
See
state_spacesfor more details.- Parameters:
agent – Name of the agent.
- Returns:
The state space for the specified agent.
- abstractmethod step(actions: dict[str, jax.Array]) tuple[dict[str, jax.Array], dict[str, jax.Array], dict[str, jax.Array], dict[str, jax.Array], dict[str, Any]][source]¶
Perform a step in the environment.
- Parameters:
actions – The actions to perform.
- Returns:
Observation, reward, terminated, truncated, info.
- property agents: list[str][source]¶
Names of all current agents.
These may be changed as an environment progresses (i.e. agents can be added or removed).
- property device: jax.Device[source]¶
The device used by the environment.
If the wrapped environment does not have the
deviceproperty, the value of this property will be"cuda"or"cpu"depending on the device availability.
- property max_num_agents: int[source]¶
Number of possible agents the environment could generate.
Read from the length of the
possible_agentsproperty if the wrapped environment doesn’t define it.
- property num_agents: int[source]¶
Number of current agents.
Read from the length of the
agentsproperty if the wrapped environment doesn’t define it.
- property num_envs: int[source]¶
Number of environments.
If the wrapped environment does not have the
num_envsproperty, it will be set to 1.
- property possible_agents: list[str][source]¶
Names of all possible agents the environment could generate.
These can not be changed as an environment progresses.
- property state_spaces: dict[str, gymnasium.Space | None][source]¶
State spaces.
Although this property returns a dictionary, the space for each agent adheres to the next rules:
The wrapped environment has the
state_spaceattribute (homogeneous state). The state is a global view of the environment, so the space is the same for all agents.The wrapped environment has the
state_spacesattribute (heterogeneous state). The state may differ for each agent, so the agent spaces may also differ.The wrapped environment does not have the previous attributes. The space is
Nonefor all agents.
- class skrl.envs.wrappers.jax.isaaclab_envs.IsaacLabMultiAgentWrapper(env: Any)[source]¶
Bases:
MultiAgentEnvWrapperIsaac Lab environment wrapper for multi-agent implementation.
- Parameters:
env – The environment instance to wrap.
Methods:
action_space(agent)Action space.
close()Close the environment.
observation_space(agent)Observation space.
render(*args, **kwargs)Render the environment.
reset()Reset the environment.
state()Get the environment state.
state_space(agent)State space.
step(actions)Perform a step in the environment.
Attributes:
Action spaces.
Names of all current agents.
The device used by the environment.
Number of possible agents the environment could generate.
Number of current agents.
Number of environments.
Observation spaces.
Names of all possible agents the environment could generate.
State spaces.
- action_space(agent: str) gymnasium.Space[source]¶
Action space.
- Parameters:
agent – Name of the agent.
- Returns:
The action space for the specified agent.
- observation_space(agent: str) gymnasium.Space[source]¶
Observation space.
- Parameters:
agent – Name of the agent.
- Returns:
The observation space for the specified agent.
- reset() tuple[dict[str, jax.Array], dict[str, Any]][source]¶
Reset the environment.
- Returns:
Observation, info.
- state_space(agent: str) gymnasium.Space | None[source]¶
State space.
See
state_spacesfor more details.- Parameters:
agent – Name of the agent.
- Returns:
The state space for the specified agent.
- step(actions: dict[str, jax.Array]) tuple[dict[str, jax.Array], dict[str, jax.Array], dict[str, jax.Array], dict[str, jax.Array], dict[str, Any]][source]¶
Perform a step in the environment.
- Parameters:
actions – The actions to perform.
- Returns:
Observation, reward, terminated, truncated, info.
- property agents: list[str][source]¶
Names of all current agents.
These may be changed as an environment progresses (i.e. agents can be added or removed).
- property device: jax.Device[source]¶
The device used by the environment.
If the wrapped environment does not have the
deviceproperty, the value of this property will be"cuda"or"cpu"depending on the device availability.
- property max_num_agents: int[source]¶
Number of possible agents the environment could generate.
Read from the length of the
possible_agentsproperty if the wrapped environment doesn’t define it.
- property num_agents: int[source]¶
Number of current agents.
Read from the length of the
agentsproperty if the wrapped environment doesn’t define it.
- property num_envs: int[source]¶
Number of environments.
If the wrapped environment does not have the
num_envsproperty, it will be set to 1.
- property possible_agents: list[str][source]¶
Names of all possible agents the environment could generate.
These can not be changed as an environment progresses.
- property state_spaces: dict[str, gymnasium.Space | None][source]¶
State spaces.
Although this property returns a dictionary, the space for each agent adheres to the next rules:
The wrapped environment has the
state_spaceattribute (homogeneous state). The state is a global view of the environment, so the space is the same for all agents.The wrapped environment has the
state_spacesattribute (heterogeneous state). The state may differ for each agent, so the agent spaces may also differ.The wrapped environment does not have the previous attributes. The space is
Nonefor all agents.
- class skrl.envs.wrappers.jax.pettingzoo_envs.PettingZooWrapper(env: Any)[source]¶
Bases:
MultiAgentEnvWrapperPettingZoo (Parallel API) environment wrapper.
- Parameters:
env – The environment instance to wrap.
Methods:
action_space(agent)Action space.
close()Close the environment.
observation_space(agent)Observation space.
render(*args, **kwargs)Render the environment.
reset()Reset the environment.
state()Get the environment state.
state_space(agent)State space.
step(actions)Perform a step in the environment.
Attributes:
Action spaces.
Names of all current agents.
The device used by the environment.
Number of possible agents the environment could generate.
Number of current agents.
Number of environments.
Observation spaces.
Names of all possible agents the environment could generate.
State spaces.
- action_space(agent: str) gymnasium.Space[source]¶
Action space.
- Parameters:
agent – Name of the agent.
- Returns:
The action space for the specified agent.
- observation_space(agent: str) gymnasium.Space[source]¶
Observation space.
- Parameters:
agent – Name of the agent.
- Returns:
The observation space for the specified agent.
- reset() tuple[dict[str, jax.Array], dict[str, Any]][source]¶
Reset the environment.
- Returns:
Observation, info.
- state() dict[jax.Array | None][source]¶
Get the environment state.
In PettingZoo, the state is a global view of the environment, so it is the same for all agents.
- Returns:
State.
- state_space(agent: str) gymnasium.Space | None[source]¶
State space.
See
state_spacesfor more details.- Parameters:
agent – Name of the agent.
- Returns:
The state space for the specified agent.
- step(actions: dict[str, jax.Array]) tuple[dict[str, jax.Array], dict[str, jax.Array], dict[str, jax.Array], dict[str, jax.Array], dict[str, Any]][source]¶
Perform a step in the environment.
- Parameters:
actions – The actions to perform.
- Returns:
Observation, reward, terminated, truncated, info.
- property agents: list[str][source]¶
Names of all current agents.
These may be changed as an environment progresses (i.e. agents can be added or removed).
- property device: jax.Device[source]¶
The device used by the environment.
If the wrapped environment does not have the
deviceproperty, the value of this property will be"cuda"or"cpu"depending on the device availability.
- property max_num_agents: int[source]¶
Number of possible agents the environment could generate.
Read from the length of the
possible_agentsproperty if the wrapped environment doesn’t define it.
- property num_agents: int[source]¶
Number of current agents.
Read from the length of the
agentsproperty if the wrapped environment doesn’t define it.
- property num_envs: int[source]¶
Number of environments.
If the wrapped environment does not have the
num_envsproperty, it will be set to 1.
- property possible_agents: list[str][source]¶
Names of all possible agents the environment could generate.
These can not be changed as an environment progresses.
- property state_spaces: dict[str, gymnasium.Space | None][source]¶
State spaces.
Although this property returns a dictionary, the space for each agent adheres to the next rules:
The wrapped environment has the
state_spaceattribute (homogeneous state). The state is a global view of the environment, so the space is the same for all agents.The wrapped environment has the
state_spacesattribute (heterogeneous state). The state may differ for each agent, so the agent spaces may also differ.The wrapped environment does not have the previous attributes. The space is
Nonefor all agents.