24 Jun
2022
24 Jun
'22
1:23 p.m.
Here is my code: # TODO: Evaluate the agent # Create a new environment for evaluation import stable_baselines3.common.env_util eval_env = stable_baselines3.common.env_util.make_vec_env('LunarLander-v2', n_envs=4) # Evaluate the model with 10 evaluation episodes and deterministic=True import stable_baselines3.common.evaluation mean_reward, std_reward = stable_baselines3.common.evaluation.evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True) # Print the results print(f'Rewards: mean={mean_reward} std={std_reward}') The model finished training, and I ran it. I think it does a total of 40 episodes because I passed a vectorised environment. It displayed a mean reward of around 251 and an std of around 20.5