In colab, you have to click the little [ ] boxes next to the code blocks, so that they run, and this only succeeds if they're clicked from top to bottom so that the imports exist etc. I ran the example, and it took a bunch of random actions, but did not end the environment showing the lander is still in the air after 20 jet thrusts. In Step 4, which is where I am now in the colab notebook, an agent is trained to land correctly on the moon. A link is given to the LunarLander environment and agent: [1]https://www.gymlibrary.ml/environments/box2d/lunar_lander/ It says it's good to check the documentation for an environment before starting to use it. We can add that to the homework: check the lunar lander environmentdocumentation before doing work of one's own or such on a lander model. Next, here's the code for reviewing the environment: env = gym.make('LunarLander-v2') env.reset() print('_____OBSERVATION SPACE_____ \n') print('Observation Space Shape', env.observation_space.shape) print('Sample observation', env.observation_space.sample()) # get random observation The output shows the observation space is a vector of 8 floats. That's all the input the agent gets. The floats are: - pad X (horizontal) coordinate - pad Y (vertical) coordinate - lander speed X (horizontal) - lander speed Y (vertical) - lander angle - lander angular speed - left leg contact - right leg contact print('\n _____ACTION SPACE_____ \n') print('Action Space Shape', env.action_space.n) print('Action Space Sample', en.action_space.sample()) # random action The output shows the action is space is an integer among the range [0,4) . These integers are: - do nothing - fire left orientation engine - fire main engine - fire right orientation engine The lab text then describes the reward function for each timestep, which is embedded within the environment as I complained earlier. - Moving from the top of the screen to the landing pad and zero speed is around 100-140 points. - Firing main engine is -0.3 every frame - Each leg ground contact is +10 points - Episode finishes if the lander crashes (additional -100 points) or comes to rest (+100) points - Game is solved if your agent does 200 points. References 1. https://www.gymlibrary.ml/environments/box2d/lunar_lander/