In colab, you have to click the little [ ] boxes next to the code blocks, so that they run, and this only succeeds if they're clicked from top to bottom so that the imports exist etc.

I ran the example, and it took a bunch of random actions, but did not end the environment showing the lander is still in the air after 20 jet thrusts.

In Step 4, which is where I am now in the colab notebook, an agent is trained to land correctly on the moon.

A link is given to the LunarLander environment and agent: https://www.gymlibrary.ml/environments/box2d/lunar_lander/

It says it's good to check the documentation for an environment before starting to use it.

We can add that to the homework: check the lunar lander environmentdocumentation before doing work of one's own or such on a lander model.

Next, here's the code for reviewing the environment:

env = gym.make('LunarLander-v2')

env.reset()

print('_____OBSERVATION SPACE_____ \n')

print('Observation Space Shape', env.observation_space.shape)

print('Sample observation', env.observation_space.sample()) # get random observation

The output shows the observation space is a vector of 8 floats. That's all the input the agent gets.

The floats are:

- pad X (horizontal) coordinate

- pad Y (vertical) coordinate

- lander speed X (horizontal)

- lander speed Y (vertical)

- lander angle

- lander angular speed

- left leg contact

- right leg contact

print('\n _____ACTION SPACE_____ \n')

print('Action Space Shape', env.action_space.n)

print('Action Space Sample', en.action_space.sample()) # random action

The output shows the action is space is an integer among the range [0,4) .

These integers are:

- do nothing

- fire left orientation engine

- fire main engine

- fire right orientation engine

The lab text then describes the reward function for each timestep, which is embedded within the environment as I complained earlier.

- Moving from the top of the screen to the landing pad and zero speed is around 100-140 points.

- Firing main engine is -0.3 every frame

- Each leg ground contact is +10 points

- Episode finishes if the lander crashes (additional -100 points) or comes to rest (+100) points

- Game is solved if your agent does 200 points.