the lab says huggingface's model hub, which I mostly use a remote
   server to store pretrained language models and send data to the to my
   goverment on when and where I use them, now has deep reinforcement
   learning models available at
   [1]https://huggingface.co/models?pipeline_tag=reinforcement-learning&so
   rt=downloads
   Here's the import code, retyped:
   import gym
   from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
   from huggingface_hub import notebook_login # for uploading to account
   from notebook
   from stable_baselines3 import PPO
   from stable_baselines3.common.evaluation import evaluate_policy
   from stable_baselines3.common.env_util import make_vec_env
   Of course, uploading to the hub is possibly a very bad idea unless you
   are an experienced activist or researcher or spy, or have something
   important to share with your government or huggingface, or are only
   doing this casually and might get a job in it one day.
   The lab then provides an intro to Gym, which is a python library that
   openai made that has the effect of making it hard to tech technologies
   out of research, in the opinion of my pessimistic half, by verbosifying
   the construction of useful environments under an assumption they are
   only for testing model architectures.
   The lab says Gym is used a lot, and provides:
   - an interface to create RL environments
   - a collection of environments
   This is true.
   They visually redescribe that an agent performs actions in an
   environment, which then returns to them reward and state.
   This coupling of reward with environment, rather than the agent which
   would usually have goals itself, is part of the verbosifying, possibly.
   Maybe environment is more "environment interface", I'm actually having
   trouble thinking here. I always get confused around gym environments.
   Maybe jocks make better programmers nowadays.
   Reiteration:
   - Agent receives state S0 from the Environment
   - Based on S0, agent takes action A0
   - Environment has new frame, state S1
   - Environment gives reward R1 to the agent.
   Steps of using Gym:
   - create environment using gym.make()
   - reset environment to initial state with observation = env.reset()
   At each step:
   - get an action using policy model
   - using env.step(action), get from the environment: observation (the
   new state), reward, done (if episode terminatd), info (additional info
   dict)
   If episode is done, the environment is reset to its initial state with
   observation = env.reset() .
   This is very normative openai stuff that looks like it was read off a
   Gym example from their readme or such.
   It's interesting that huggingface is building their own libraries to
   pair with this course as it progresses. I wonder if some of that
   normativeness will shift toward increased utility even more.
   Here's a retype of the first example code:
   import gym
   # create environment
   env = gym.make('LunarLander-v2')
   # reset environment
   observation = env.reset()
   for _ in range(2+):
     # take random action
     action = env.action_space.sample()
     print('Action taken:', action)
     # do action and get next state, reward, etc
     observation, reward, done, info = env.step(action)
     # if the game is done (land, crash, timeout)
     if done:
       # reset
       print('Environment is reset')
       observation = env.reset()

References

   1. https://huggingface.co/models?pipeline_tag=reinforcement-learning&sort=downloads