
Maybe I'll read the next page of "getting started" and then jump to PPO and the api interfaces. https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html RL Resource Page: https://stable-baselines3.readthedocs.io/en/master/guide/rl.html Oh! a normative tutorial: https://github.com/araffin/rl-tutorial-jnrr19/tree/sb3
Reinforcement Learning differs from other machine learning methods in several ways. The data used to train the agent is collected through interactions with the environment by the agent itself (compared to supervised learning where you have a fixed dataset for instance). This dependence can lead to vicious circle: if the agent collects poor quality data (e.g., trajectories with no rewards), then it will not improve and continue to amass bad trajectories.
This factor, among others, explains that results in RL may vary from one run to another (i.e., when only the seed of the pseudo-random generator changes). For this reason, you should always do several runs to have quantitative results.
Good results in RL are generally dependent on finding appropriate hyperparameters. Recent algorithms (PPO, SAC, TD3) normally require little hyperparameter tuning, however, don’t expect the default ones to work on any environment.
Therefore, we highly recommend you to take a look at the RL zoo (or the original papers) for tuned hyperparameters. A best practice when you apply RL to a new problem is to do automatic hyperparameter optimization. Again, this is included in the RL zoo.
When applying RL to a custom problem, you should always normalize the input to the agent (e.g. using VecNormalize for PPO/A2C) and look at common preprocessing done on other environments (e.g. for Atari, frame-stack, …). Please refer to Tips and Tricks when creating a custom environment paragraph below for more advice related to custom environments.
Continues around https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#how-to... (skipping the limitations section which could feel discouraging. the SB3 architectures are tuned for long and diverse training times.) The first tutorial is at https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/sb3...