Like an RL model, I have minimal working memory nowadays.

So I'll need some docs to solve this model stuff. The lab says to read them.

The challenge is to properly instantiate a PPO MlpPolicy model, and then to train it on a gym environment for 500k timesteps.

Lunar Lander environment documentation: https://www.gymlibrary.ml/environments/box2d/lunar_lander "check the documentation"

Stable Baselines 3 documentation: https://stable-baselines3.readthedocs.io/en/master "dive in and try some tutorials"

SB3 PPO documentation, I left this link out: https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#example "you'll study it during this course"