Manual training¶
Train agents by manually controlling the training/evaluation loop.
Concept¶
Usage¶
# assuming there is an environment named 'env'
# and an agent named 'agents' (or a state-preprocessor and a policy)
states, infos = env.reset()
for i in range(1000):
# state-preprocessor + policy
with torch.no_grad():
states = state_preprocessor(states)
actions = policy.act({"states": states})[0]
# step the environment
next_states, rewards, terminated, truncated, infos = env.step(actions)
# render the environment
env.render()
# check for termination/truncation
if terminated.any() or truncated.any():
states, infos = env.reset()
else:
states = next_states