[ot][spam] Behavior Log For Control Data: HFRL Unit 1 Lab

Fri Jun 24 05:57:40 PDT 2022

0853 here is what I have. i did not look up the terms i was unsure of.
i will instead move on with the lab.

# TODO: Define a PPO MlpPolicy architecture
# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,
# if we had frames as input we would use CnnPolicy
import stable_baselines3
model = stable_baselines3.PPO(
    'MlpPolicy', # vector input, CnnPolicy is for images
    env, # environment objects to feed back with
    verbose=1, # output information
    n_steps = 1024, # number of steps policy takes in each parallel
environment before updating
    batch_size = 64, # number of data items sent interdependently to
the gpu when updating. faster, smoother & better results when this is
higher.
    n_epochs = 4, # not sure, usually this means how many times to run
over the data
    gamma = 0.999 # not sure, relates to PPO I think
)