[ot][spam][crazy] lab1 was: draft: learning RL

Mon May 9 06:46:35 PDT 2022

In colab, you have to click the little [ ] boxes next to the code blocks,
so that they run, and this only succeeds if they're clicked from top to
bottom so that the imports exist etc.

I ran the example, and it took a bunch of random actions, but did not end
the environment showing the lander is still in the air after 20 jet thrusts.

In Step 4, which is where I am now in the colab notebook, an agent is
trained to land correctly on the moon.

A link is given to the LunarLander environment and agent:
https://www.gymlibrary.ml/environments/box2d/lunar_lander/

It says it's good to check the documentation for an environment before
starting to use it.

We can add that to the homework: check the lunar lander
environmentdocumentation before doing work of one's own or such on a lander
model.

Next, here's the code for reviewing the environment:

env = gym.make('LunarLander-v2')
env.reset()
print('_____OBSERVATION SPACE_____ \n')
print('Observation Space Shape', env.observation_space.shape)
print('Sample observation', env.observation_space.sample()) # get random
observation

The output shows the observation space is a vector of 8 floats. That's all
the input the agent gets.

The floats are:
- pad X (horizontal) coordinate
- pad Y (vertical) coordinate
- lander speed X (horizontal)
- lander speed Y (vertical)
- lander angle
- lander angular speed
- left leg contact
- right leg contact

print('\n _____ACTION SPACE_____ \n')
print('Action Space Shape', env.action_space.n)
print('Action Space Sample', en.action_space.sample()) # random action

The output shows the action is space is an integer among the range [0,4) .

These integers are:
- do nothing
- fire left orientation engine
- fire main engine
- fire right orientation engine

The lab text then describes the reward function for each timestep, which is
embedded within the environment as I complained earlier.

- Moving from the top of the screen to the landing pad and zero speed is
around 100-140 points.
- Firing main engine is -0.3 every frame
- Each leg ground contact is +10 points
- Episode finishes if the lander crashes (additional -100 points) or comes
to rest (+100) points
- Game is solved if your agent does 200 points.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 3539 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220509/1e04d3a7/attachment.txt>