[ot][spam] Behavior Log For Control Data: HFRL Unit 1 Lab

Fri Jun 24 06:23:15 PDT 2022

Here is my code:

# TODO: Evaluate the agent
# Create a new environment for evaluation
import stable_baselines3.common.env_util
eval_env = stable_baselines3.common.env_util.make_vec_env('LunarLander-v2',
n_envs=4)

# Evaluate the model with 10 evaluation episodes and deterministic=True
import stable_baselines3.common.evaluation
mean_reward, std_reward =
stable_baselines3.common.evaluation.evaluate_policy(model, eval_env,
n_eval_episodes=10, deterministic=True)

# Print the results
print(f'Rewards: mean={mean_reward} std={std_reward}')

The model finished training, and I ran it.
I think it does a total of 40 episodes because I passed a vectorised
environment.
It displayed a mean reward of around 251 and an std of around 20.5