Like an RL model, I have minimal working memory nowadays.

So I'll need some docs to solve this model stuff. The lab says to read them.

The challenge is to properly instantiate a PPO MlpPolicy model, and then to train it on a gym environment for 500k timesteps.

Lunar Lander environment documentation: "check the documentation"

Stable Baselines 3 documentation: "dive in and try some tutorials"

 SB3 PPO documentation, I left this link out: "you'll study it during this course"