[ot][spam] Behavior Log For Control Data: HFRL Unit 1 Lab
Undiscussed Horrific Abuse, One Victim of Many
gmkarl at gmail.com
Fri Jun 24 06:23:15 PDT 2022
Here is my code:
# TODO: Evaluate the agent
# Create a new environment for evaluation
import stable_baselines3.common.env_util
eval_env = stable_baselines3.common.env_util.make_vec_env('LunarLander-v2',
n_envs=4)
# Evaluate the model with 10 evaluation episodes and deterministic=True
import stable_baselines3.common.evaluation
mean_reward, std_reward =
stable_baselines3.common.evaluation.evaluate_policy(model, eval_env,
n_eval_episodes=10, deterministic=True)
# Print the results
print(f'Rewards: mean={mean_reward} std={std_reward}')
The model finished training, and I ran it.
I think it does a total of 40 episodes because I passed a vectorised
environment.
It displayed a mean reward of around 251 and an std of around 20.5
More information about the cypherpunks
mailing list