[ot][spam] Behavior Log For Control Data: HFRL Unit 1 Lab
Undiscussed Horrific Abuse, One Victim of Many
gmkarl at gmail.com
Fri Jun 24 05:52:47 PDT 2022
0851
This is the solution I filled in:
# TODO: Define a PPO MlpPolicy architecture
# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,
# if we had frames as input we would use CnnPolicy
import stable_baselines3
model = stable_baselines3.PPO('MlpPolicy', env, verbose=1)
This is the solution they provide:
# SOLUTION
# We added some parameters to fasten the training
model = PPO(
policy = 'MlpPolicy',
env = env,
n_steps = 1024,
batch_size = 64,
n_epochs = 4,
gamma = 0.999,
gae_lambda = 0.98,
ent_coef = 0.01,
verbose=1)
I will copy their parameters over to my code, thinking briefly about
each one. I recognise 3 of them. I recall that some of them were
mentioned in the learning material, and I do not remember what they
are.
More information about the cypherpunks
mailing list