0851 This is the solution I filled in: # TODO: Define a PPO MlpPolicy architecture # We use MultiLayerPerceptron (MLPPolicy) because the input is a vector, # if we had frames as input we would use CnnPolicy import stable_baselines3 model = stable_baselines3.PPO('MlpPolicy', env, verbose=1) This is the solution they provide: # SOLUTION # We added some parameters to fasten the training model = PPO( policy = 'MlpPolicy', env = env, n_steps = 1024, batch_size = 64, n_epochs = 4, gamma = 0.999, gae_lambda = 0.98, ent_coef = 0.01, verbose=1) I will copy their parameters over to my code, thinking briefly about each one. I recognise 3 of them. I recall that some of them were mentioned in the learning material, and I do not remember what they are.