Adversarial Motion Priors (AMP)
AMP is a model-free, stochastic on-policy policy gradient algorithm (trained using a combination of GAIL and PPO) for adversarial learning of physics-based character animation. It enables characters to imitate diverse behaviors from large unstructured datasets, without the need for motion planners or other mechanisms for clip selection
Paper: AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control
Algorithm implementation
Learning algorithm (_update(...)
)
compute_gae(...)
Configuration and hyperparameters
- skrl.agents.torch.amp.amp.AMP_DEFAULT_CONFIG
1AMP_DEFAULT_CONFIG = {
2 "rollouts": 16, # number of rollouts before updating
3 "learning_epochs": 6, # number of learning epochs during each update
4 "mini_batches": 2, # number of mini batches during each learning epoch
5
6 "discount_factor": 0.99, # discount factor (gamma)
7 "lambda": 0.95, # TD(lambda) coefficient (lam) for computing returns and advantages
8
9 "learning_rate": 5e-5, # learning rate
10 "learning_rate_scheduler": None, # learning rate scheduler class (see torch.optim.lr_scheduler)
11 "learning_rate_scheduler_kwargs": {}, # learning rate scheduler's kwargs (e.g. {"step_size": 1e-3})
12
13 "state_preprocessor": None, # state preprocessor class (see skrl.resources.preprocessors)
14 "state_preprocessor_kwargs": {}, # state preprocessor's kwargs (e.g. {"size": env.observation_space})
15 "value_preprocessor": None, # value preprocessor class (see skrl.resources.preprocessors)
16 "value_preprocessor_kwargs": {}, # value preprocessor's kwargs (e.g. {"size": 1})
17 "amp_state_preprocessor": None, # AMP state preprocessor class (see skrl.resources.preprocessors)
18 "amp_state_preprocessor_kwargs": {}, # AMP state preprocessor's kwargs (e.g. {"size": env.amp_observation_space})
19
20 "random_timesteps": 0, # random exploration steps
21 "learning_starts": 0, # learning starts after this many steps
22
23 "grad_norm_clip": 0.0, # clipping coefficient for the norm of the gradients
24 "ratio_clip": 0.2, # clipping coefficient for computing the clipped surrogate objective
25 "value_clip": 0.2, # clipping coefficient for computing the value loss (if clip_predicted_values is True)
26 "clip_predicted_values": False, # clip predicted values during value loss computation
27
28 "entropy_loss_scale": 0.0, # entropy loss scaling factor
29 "value_loss_scale": 2.5, # value loss scaling factor
30 "discriminator_loss_scale": 5.0, # discriminator loss scaling factor
31
32 "amp_batch_size": 512, # batch size for updating the reference motion dataset
33 "task_reward_weight": 0.0, # task-reward weight (wG)
34 "style_reward_weight": 1.0, # style-reward weight (wS)
35 "discriminator_batch_size": 0, # batch size for computing the discriminator loss (all samples if 0)
36 "discriminator_reward_scale": 2, # discriminator reward scaling factor
37 "discriminator_logit_regularization_scale": 0.05, # logit regularization scale factor for the discriminator loss
38 "discriminator_gradient_penalty_scale": 5, # gradient penalty scaling factor for the discriminator loss
39 "discriminator_weight_decay_scale": 0.0001, # weight decay scaling factor for the discriminator loss
40
41 "rewards_shaper": None, # rewards shaping function: Callable(reward, timestep, timesteps) -> reward
42
43 "experiment": {
44 "directory": "", # experiment's parent directory
45 "experiment_name": "", # experiment name
46 "write_interval": 250, # TensorBoard writing interval (timesteps)
47
48 "checkpoint_interval": 1000, # interval for checkpoints (timesteps)
49 "store_separately": False, # whether to store checkpoints separately
50
51 "wandb": False, # whether to use Weights & Biases
52 "wandb_kwargs": {} # wandb kwargs (see https://docs.wandb.ai/ref/python/init)
53 }
54}
Spaces and models
The implementation supports the following Gym spaces / Gymnasium spaces
Gym/Gymnasium spaces |
AMP observation |
Observation |
Action |
---|---|---|---|
Discrete |
\(\square\) |
\(\square\) |
\(\square\) |
Box |
\(\blacksquare\) |
\(\blacksquare\) |
\(\blacksquare\) |
Dict |
\(\square\) |
\(\square\) |
\(\square\) |
The implementation uses 1 stochastic (continuous) and 2 deterministic function approximators. These function approximators (models) must be collected in a dictionary and passed to the constructor of the class under the argument models
Notation |
Concept |
Key |
Input shape |
Output shape |
Type |
---|---|---|---|---|---|
\(\pi_\theta(s)\) |
Policy |
|
observation |
action |
|
\(V_\phi(s)\) |
Value |
|
observation |
1 |
|
\(D_\psi(s_{_{AMP}})\) |
Discriminator |
|
AMP observation |
1 |
Support for advanced features is described in the next table
Feature |
Support and remarks |
---|---|
Shared model |
- |
RNN support |
- |