[ot][spam] Behavior Log For Control Data: HFRL Unit 1 Lab

Fri Jun 24 05:59:41 PDT 2022

0859 The next task is this:

Step 6: Train the PPO agent 🏃
Let's train our agent for 500,000 timesteps, don't forget to use GPU
on Colab. It will take approximately ~10min, but you can use less
timesteps if you just want to try it out.

I will plan to try it out with a short number of timesteps. My first
approach for finding how to do this will be scrolling up in the lab.