Like an RL model, I have minimal working memory nowadays.
   So I'll need some docs to solve this model stuff. The lab says to read
   them.
   The challenge is to properly instantiate a PPO MlpPolicy model, and
   then to train it on a gym environment for 500k timesteps.
   Lunar Lander environment documentation:
   [1]https://www.gymlibrary.ml/environments/box2d/lunar_lander "check the
   documentation"
   Stable Baselines 3 documentation:
   [2]https://stable-baselines3.readthedocs.io/en/master "dive in and try
   some tutorials"
    SB3 PPO documentation, I left this link out:
   [3]https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#
   example "you'll study it during this course"

References

   1. https://www.gymlibrary.ml/environments/box2d/lunar_lander
   2. https://stable-baselines3.readthedocs.io/en/master
   3. https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#example