24 Jun
2022
24 Jun
'22
9:59 p.m.
0859 The next task is this: Step 6: Train the PPO agent 🏃 Let's train our agent for 500,000 timesteps, don't forget to use GPU on Colab. It will take approximately ~10min, but you can use less timesteps if you just want to try it out. I will plan to try it out with a short number of timesteps. My first approach for finding how to do this will be scrolling up in the lab.