[ot][spam] Behavior Log For Control Data: HFRL Unit 1 Lab

Fri Jun 24 05:52:47 PDT 2022

0851

This is the solution I filled in:

# TODO: Define a PPO MlpPolicy architecture
# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,
# if we had frames as input we would use CnnPolicy
import stable_baselines3
model = stable_baselines3.PPO('MlpPolicy', env, verbose=1)

This is the solution they provide:

# SOLUTION
# We added some parameters to fasten the training
model = PPO(
    policy = 'MlpPolicy',
    env = env,
    n_steps = 1024,
    batch_size = 64,
    n_epochs = 4,
    gamma = 0.999,
    gae_lambda = 0.98,
    ent_coef = 0.01,
    verbose=1)

I will copy their parameters over to my code, thinking briefly about
each one. I recognise 3 of them. I recall that some of them were
mentioned in the learning material, and I do not remember what they
are.