Re: [ot][spam][crazy] lab1 docs was: lab1 was: draft: learning RL

9 May 2022

      Ok, that page was so short!!! Back to:

Lunar Lander Documentation:
https://www.gymlibrary.ml/environments/box2d/lunar_lander
...
Action Space: Discrete(4)
Action is 1 of 4 integers
Observation Shape: (8,)
Observation space is an unbounded 8-vector
Observation High: [inf inf inf inf inf inf inf inf]
Observation Low: [-inf -inf -inf -inf -inf -inf -inf -inf]
Import: gym.make("LunarLander-v2")
...
Description
This environment is a classic rocket trajectory optimization problem. According to
Pontryagin’s maximum principle, it is optimal to fire the engine at full throttle or turn it off.
This is the reason why this environment has discrete actions: engine on or off.
Aww shouldn't the model learn this?
...
There are two environment versions: discrete or continuous. The landing pad is always at
coordinates (0,0). The coordinates are the first two numbers in the state vector. Landing
outside of the landing pad is possible. Fuel is infinite, so an agent can learn to fly and then
land on its first attempt.
To see a heuristic landing, run:
python gym/envs/box2d/lunar_lander.py
Otherwise known as:
pip3 install gym[box2d] && python3 -m gym.envs.box2d.lunar_lander # i think
...
Action Space
There are four discrete actions available: do nothing, fire left orientation engine, fire main
engine, fire right orientation engine.
Observation Space
There are 8 states: the coordinates of the lander in x & y, its linear velocities in x & y, its
angle, its angular velocity, and two booleans that represent whether each leg is in contact
with the ground or not.
...
Rewards
Reward for moving from the top of the screen to the landing pad and coming to rest is
about 100-140 points. If the lander moves away from the landing pad, it loses reward. If the
lander crashes, it receives an additional -100 points. If it comes to rest, it receives an
additional +100 points. Each leg with ground contact is +10 points. Firing the main engine
is -0.3 points each frame. Firing the side engine is -0.03 points each frame. Solved is 200
points.
This is very very similar to the text from huggingface's lab.
...
Starting State
The lander starts at the top center of the viewport with a random initial force applied to its
center of mass.
...
Episode Termination
The episode finishes if:
1. the lander crashes (the lander body gets in contact with the moon);
2. the lander gets outside of the viewport (x coordinate is greater than 1);
3. the lander is not awake. From the Box2D docs, a body which is not awake is a
body which doesn’t move and doesn’t collide with any other body:
...
...
When Box2D determines that a body (or group of bodies) has come to rest, the body
enters a sleep state which has very little CPU overhead. If a body is awake and collides
with a sleeping body, then the sleeping body wakes up. Bodies will also wake up if a joint
or contact attached to them is destroyed.
...
Arguments
To use to the continuous environment, you need to specify the continuous=True argument
like below:
...
import gym
env = gym.make("LunarLander-v2", continuous=True)
They don't say what the continuous environment is. It seems like
source code is still a better resource than documentation.

When installed with pip in linux, the environment source is at
~/.local/lib/python3.*/site-packages/gym/envs/box2d/lunar_lander.py
for me. On the web, that's
https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py
.

It looks like the documentation on the web is not up to date or is
truncated for some reason. The documentation in the source code does
indeed continue:
...
If `continuous=True` is passed, continuous actions (corresponding to the throttle of the engines) will be used and the
   action space will be `Box(-1, +1, (2,), dtype=np.float32)`.
   The first coordinate of an action determines the throttle of the main engine, while the second
   coordinate specifies the throttle of the lateral boosters.
   Given an action `np.array([main, lateral])`, the main engine will be turned off completely if
   `main < 0` and the throttle scales affinely from 50% to 100% for `0 <= main <= 1` (in particular, the
   main engine doesn't work  with less than 50% power).
   Similarly, if `-0.5 < lateral < 0.5`, the lateral boosters will not fire at all. If `lateral < -0.5`, the left
   booster will fire, and if `lateral > 0.5`, the right booster will fire. Again, the throttle scales affinely
   from 50% to 100% between -1 and -0.5 (and 0.5 and 1, respectively).
   `gravity` dictates the gravitational constant, this is bounded to be within 0 and -12.
   If `enable_wind=True` is passed, there will be wind effects applied to the lander.
   The wind is generated using the function `tanh(sin(2 k (t+C)) + sin(pi k (t+C)))`.
   `k` is set to 0.01.
   `C` is sampled randomly between -9999 and 9999.
   `wind_power` dictates the maximum magnitude of wind.
So, you can indeed provide a harder challenge to the agent, by using
continuous=True and/or enable_wind=True . Like usual, they thought of
my concern.

This appears to roughly be the full documentation of the LunarLander
environment (v2).

Re: [ot][spam][crazy] lab1 docs was: lab1 was: draft: learning RL

Undiscussed Horrific Abuse, One Victim of Many