Re: [ot][spam][crazy] lab1 docs was: lab1 was: draft: learning RL

10 May 2022

      Maybe I'll read the next page of "getting started" and then jump to
PPO and the api interfaces.

https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html

RL Resource Page:
https://stable-baselines3.readthedocs.io/en/master/guide/rl.html

Oh! a normative tutorial: https://github.com/araffin/rl-tutorial-jnrr19/tree/sb3
...
Reinforcement Learning differs from other machine learning methods in several ways. The
data used to train the agent is collected through interactions with the environment by the
agent itself (compared to supervised learning where you have a fixed dataset for instance).
This dependence can lead to vicious circle: if the agent collects poor quality data (e.g.,
trajectories with no rewards), then it will not improve and continue to amass bad
trajectories.
This factor, among others, explains that results in RL may vary from one run to another
(i.e., when only the seed of the pseudo-random generator changes). For this reason, you
should always do several runs to have quantitative results.
Good results in RL are generally dependent on finding appropriate hyperparameters.
Recent algorithms (PPO, SAC, TD3) normally require little hyperparameter tuning,
however, don’t expect the default ones to work on any environment.
Therefore, we highly recommend you to take a look at the RL zoo (or the original papers)
for tuned hyperparameters. A best practice when you apply RL to a new problem is to do
automatic hyperparameter optimization. Again, this is included in the RL zoo.
When applying RL to a custom problem, you should always normalize the input to the
agent (e.g. using VecNormalize for PPO/A2C) and look at common preprocessing done on
other environments (e.g. for Atari, frame-stack, …). Please refer to Tips and Tricks when
creating a custom environment paragraph below for more advice related to custom
environments.
Continues around
https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#how-to...
(skipping the limitations section which could feel discouraging. the
SB3 architectures are tuned for long and diverse training times.)

The first tutorial is at
https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/sb3...

Re: [ot][spam][crazy] lab1 docs was: lab1 was: draft: learning RL

Undiscussed Horrific Abuse, One Victim of Many