In colab, you have to click the little [ ] boxes next to the code
   blocks, so that they run, and this only succeeds if they're clicked
   from top to bottom so that the imports exist etc.
   I ran the example, and it took a bunch of random actions, but did not
   end the environment showing the lander is still in the air after 20 jet
   thrusts.
   In Step 4, which is where I am now in the colab notebook, an agent is
   trained to land correctly on the moon.
   A link is given to the LunarLander environment and agent:
   [1]https://www.gymlibrary.ml/environments/box2d/lunar_lander/
   It says it's good to check the documentation for an environment before
   starting to use it.
   We can add that to the homework: check the lunar lander
   environmentdocumentation before doing work of one's own or such on a
   lander model.
   Next, here's the code for reviewing the environment:
   env = gym.make('LunarLander-v2')
   env.reset()
   print('_____OBSERVATION SPACE_____ \n')
   print('Observation Space Shape', env.observation_space.shape)
   print('Sample observation', env.observation_space.sample()) # get
   random observation
   The output shows the observation space is a vector of 8 floats. That's
   all the input the agent gets.
   The floats are:
   - pad X (horizontal) coordinate
   - pad Y (vertical) coordinate
   - lander speed X (horizontal)
   - lander speed Y (vertical)
   - lander angle
   - lander angular speed
   - left leg contact
   - right leg contact
   print('\n _____ACTION SPACE_____ \n')
   print('Action Space Shape', env.action_space.n)
   print('Action Space Sample', en.action_space.sample()) # random action
   The output shows the action is space is an integer among the range
   [0,4) .
   These integers are:
   - do nothing
   - fire left orientation engine
   - fire main engine
   - fire right orientation engine
   The lab text then describes the reward function for each timestep,
   which is embedded within the environment as I complained earlier.
   - Moving from the top of the screen to the landing pad and zero speed
   is around 100-140 points.
   - Firing main engine is -0.3 every frame
   - Each leg ground contact is +10 points
   - Episode finishes if the lander crashes (additional -100 points) or
   comes to rest (+100) points
   - Game is solved if your agent does 200 points.

References

   1. https://www.gymlibrary.ml/environments/box2d/lunar_lander/