In colab, you have to click the little [ ] boxes next to the code blocks, so that they run, and this only succeeds if they're clicked from top to bottom so that the imports exist etc.

I ran the example, and it took a bunch of random actions, but did not end the environment showing the lander is still in the air after 20 jet thrusts.

In Step 4, which is where I am now in the colab notebook, an agent is trained to land correctly on the moon.

A link is given to the LunarLander environment and agent: https://www.gymlibrary.ml/environments/box2d/lunar_lander/

It says it's good to check the documentation for an environment before starting to use it.

We can add that to the homework: check the lunar lander environmentdocumentation before doing work of one's own or such on a lander model.

Next, here's the code for reviewing the environment:

env = gym.make('LunarLander-v2')
env.reset()
print('_____OBSERVATION SPACE_____ \n')
print('Observation Space Shape', env.observation_space.shape)
print('Sample observation', env.observation_space.sample()) # get random observation

The output shows the observation space is a vector of 8 floats. That's all the input the agent gets.

The floats are:
- pad X (horizontal) coordinate
- pad Y (vertical) coordinate
- lander speed X (horizontal)
- lander speed Y (vertical)
- lander angle
- lander angular speed
- left leg contact
- right leg contact

print('\n _____ACTION SPACE_____ \n')
print('Action Space Shape', env.action_space.n)
print('Action Space Sample', en.action_space.sample()) # random action

The output shows the action is space is an integer among the range [0,4) .

These integers are:
- do nothing
- fire left orientation engine
- fire main engine
- fire right orientation engine

The lab text then describes the reward function for each timestep, which is embedded within the environment as I complained earlier.

- Moving from the top of the screen to the landing pad and zero speed is around 100-140 points.
- Firing main engine is -0.3 every frame
- Each leg ground contact is +10 points
- Episode finishes if the lander crashes (additional -100 points) or comes to rest (+100) points
- Game is solved if your agent does 200 points.