Manual training#

Train agents by manually controlling the training/evaluation loop.

Concept#

Usage#

# assuming there is an environment named 'env'
# and an agent named 'agents' (or a state-preprocessor and a policy)

states, infos = env.reset()

for i in range(1000):
    # state-preprocessor + policy
    with torch.no_grad():
        states = state_preprocessor(states)
        actions = policy.act({"states": states})[0]

    # step the environment
    next_states, rewards, terminated, truncated, infos = env.step(actions)

    # render the environment
    env.render()

    # check for termination/truncation
    if terminated.any() or truncated.any():
        states, infos = env.reset()
    else:
        states = next_states