[ot][spam][crazy] lab1 docs was: lab1 was: draft: learning RL

Mon May 9 17:37:11 PDT 2022

Maybe I'll read the next page of "getting started" and then jump to
PPO and the api interfaces.

https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html

RL Resource Page:
https://stable-baselines3.readthedocs.io/en/master/guide/rl.html

Oh! a normative tutorial: https://github.com/araffin/rl-tutorial-jnrr19/tree/sb3

> Reinforcement Learning differs from other machine learning methods in several ways. The
> data used to train the agent is collected through interactions with the environment by the
> agent itself (compared to supervised learning where you have a fixed dataset for instance).
> This dependence can lead to vicious circle: if the agent collects poor quality data (e.g.,
> trajectories with no rewards), then it will not improve and continue to amass bad
> trajectories.
>
> This factor, among others, explains that results in RL may vary from one run to another
> (i.e., when only the seed of the pseudo-random generator changes). For this reason, you
> should always do several runs to have quantitative results.
>
> Good results in RL are generally dependent on finding appropriate hyperparameters.
> Recent algorithms (PPO, SAC, TD3) normally require little hyperparameter tuning,
> however, don’t expect the default ones to work on any environment.
>
> Therefore, we highly recommend you to take a look at the RL zoo (or the original papers)
> for tuned hyperparameters. A best practice when you apply RL to a new problem is to do
> automatic hyperparameter optimization. Again, this is included in the RL zoo.
>
> When applying RL to a custom problem, you should always normalize the input to the
> agent (e.g. using VecNormalize for PPO/A2C) and look at common preprocessing done on
> other environments (e.g. for Atari, frame-stack, …). Please refer to Tips and Tricks when
> creating a custom environment paragraph below for more advice related to custom
> environments.

Continues around
https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#how-to-evaluate-an-rl-algorithm
(skipping the limitations section which could feel discouraging. the
SB3 architectures are tuned for long and diverse training times.)

The first tutorial is at
https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/sb3/1_getting_started.ipynb