
Ok, that page was so short!!! Back to: Lunar Lander Documentation: https://www.gymlibrary.ml/environments/box2d/lunar_lander
Action Space: Discrete(4) Action is 1 of 4 integers Observation Shape: (8,) Observation space is an unbounded 8-vector Observation High: [inf inf inf inf inf inf inf inf] Observation Low: [-inf -inf -inf -inf -inf -inf -inf -inf] Import: gym.make("LunarLander-v2")
Description This environment is a classic rocket trajectory optimization problem. According to Pontryagin’s maximum principle, it is optimal to fire the engine at full throttle or turn it off. This is the reason why this environment has discrete actions: engine on or off. Aww shouldn't the model learn this?
There are two environment versions: discrete or continuous. The landing pad is always at coordinates (0,0). The coordinates are the first two numbers in the state vector. Landing outside of the landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt.
To see a heuristic landing, run:
python gym/envs/box2d/lunar_lander.py Otherwise known as: pip3 install gym[box2d] && python3 -m gym.envs.box2d.lunar_lander # i think
Action Space There are four discrete actions available: do nothing, fire left orientation engine, fire main engine, fire right orientation engine.
Observation Space There are 8 states: the coordinates of the lander in x & y, its linear velocities in x & y, its angle, its angular velocity, and two booleans that represent whether each leg is in contact with the ground or not.
Rewards Reward for moving from the top of the screen to the landing pad and coming to rest is about 100-140 points. If the lander moves away from the landing pad, it loses reward. If the lander crashes, it receives an additional -100 points. If it comes to rest, it receives an additional +100 points. Each leg with ground contact is +10 points. Firing the main engine is -0.3 points each frame. Firing the side engine is -0.03 points each frame. Solved is 200 points. This is very very similar to the text from huggingface's lab.
Starting State The lander starts at the top center of the viewport with a random initial force applied to its center of mass.
Episode Termination The episode finishes if: 1. the lander crashes (the lander body gets in contact with the moon); 2. the lander gets outside of the viewport (x coordinate is greater than 1); 3. the lander is not awake. From the Box2D docs, a body which is not awake is a body which doesn’t move and doesn’t collide with any other body:
When Box2D determines that a body (or group of bodies) has come to rest, the body enters a sleep state which has very little CPU overhead. If a body is awake and collides with a sleeping body, then the sleeping body wakes up. Bodies will also wake up if a joint or contact attached to them is destroyed.
Arguments To use to the continuous environment, you need to specify the continuous=True argument like below:
import gym env = gym.make("LunarLander-v2", continuous=True)
They don't say what the continuous environment is. It seems like source code is still a better resource than documentation. When installed with pip in linux, the environment source is at ~/.local/lib/python3.*/site-packages/gym/envs/box2d/lunar_lander.py for me. On the web, that's https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py . It looks like the documentation on the web is not up to date or is truncated for some reason. The documentation in the source code does indeed continue:
If `continuous=True` is passed, continuous actions (corresponding to the throttle of the engines) will be used and the action space will be `Box(-1, +1, (2,), dtype=np.float32)`. The first coordinate of an action determines the throttle of the main engine, while the second coordinate specifies the throttle of the lateral boosters. Given an action `np.array([main, lateral])`, the main engine will be turned off completely if `main < 0` and the throttle scales affinely from 50% to 100% for `0 <= main <= 1` (in particular, the main engine doesn't work with less than 50% power). Similarly, if `-0.5 < lateral < 0.5`, the lateral boosters will not fire at all. If `lateral < -0.5`, the left booster will fire, and if `lateral > 0.5`, the right booster will fire. Again, the throttle scales affinely from 50% to 100% between -1 and -0.5 (and 0.5 and 1, respectively). `gravity` dictates the gravitational constant, this is bounded to be within 0 and -12. If `enable_wind=True` is passed, there will be wind effects applied to the lander. The wind is generated using the function `tanh(sin(2 k (t+C)) + sin(pi k (t+C)))`. `k` is set to 0.01. `C` is sampled randomly between -9999 and 9999. `wind_power` dictates the maximum magnitude of wind.
So, you can indeed provide a harder challenge to the agent, by using continuous=True and/or enable_wind=True . Like usual, they thought of my concern. This appears to roughly be the full documentation of the LunarLander environment (v2).