In colab, you have to click the little [ ] boxes next to the code blocks, so that they run, and this only succeeds if they're clicked from top to bottom so that the imports exist etc.
I ran the example, and it took a bunch of random actions, but did not end the environment showing the lander is still in the air after 20 jet thrusts.
In Step 4, which is where I am now in the colab notebook, an agent is trained to land correctly on the moon.
It says it's good to check the documentation for an environment before starting to use it.
We can add that to the homework: check the lunar lander environmentdocumentation before doing work of one's own or such on a lander model.
Next, here's the code for reviewing the environment:
env = gym.make('LunarLander-v2')
env.reset()
print('_____OBSERVATION SPACE_____ \n')
print('Observation Space Shape', env.observation_space.shape)
print('Sample observation', env.observation_space.sample()) # get random observation
The output shows the observation space is a vector of 8 floats. That's all the input the agent gets.
The floats are:
- pad X (horizontal) coordinate
- pad Y (vertical) coordinate
- lander speed X (horizontal)
- lander speed Y (vertical)
- lander angle
- lander angular speed
- left leg contact
- right leg contact
print('\n _____ACTION SPACE_____ \n')
print('Action Space Shape', env.action_space.n)
print('Action Space Sample', en.action_space.sample()) # random action
The output shows the action is space is an integer among the range [0,4) .
These integers are:
- do nothing
- fire left orientation engine
- fire main engine
- fire right orientation engine
The lab text then describes the reward function for each timestep, which is embedded within the environment as I complained earlier.
- Moving from the top of the screen to the landing pad and zero speed is around 100-140 points.
- Firing main engine is -0.3 every frame
- Each leg ground contact is +10 points
- Episode finishes if the lander crashes (additional -100 points) or comes to rest (+100) points
- Game is solved if your agent does 200 points.