0853 here is what I have. i did not look up the terms i was unsure of. i will instead move on with the lab. # TODO: Define a PPO MlpPolicy architecture # We use MultiLayerPerceptron (MLPPolicy) because the input is a vector, # if we had frames as input we would use CnnPolicy import stable_baselines3 model = stable_baselines3.PPO( 'MlpPolicy', # vector input, CnnPolicy is for images env, # environment objects to feed back with verbose=1, # output information n_steps = 1024, # number of steps policy takes in each parallel environment before updating batch_size = 64, # number of data items sent interdependently to the gpu when updating. faster, smoother & better results when this is higher. n_epochs = 4, # not sure, usually this means how many times to run over the data gamma = 0.999 # not sure, relates to PPO I think )