the lab says huggingface's model hub, which I mostly use a remote server to store pretrained language models and send data to the to my goverment on when and where I use them, now has deep reinforcement learning models available at https://huggingface.co/models?pipeline_tag=reinforcement-learning&sort=downloads
Here's the import code, retyped:
import gym
from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # for uploading to account from notebook
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_vec_env
Of course, uploading to the hub is possibly a very bad idea unless you are an experienced activist or researcher or spy, or have something important to share with your government or huggingface, or are only doing this casually and might get a job in it one day.
The lab then provides an intro to Gym, which is a python library that openai made that has the effect of making it hard to tech technologies out of research, in the opinion of my pessimistic half, by verbosifying the construction of useful environments under an assumption they are only for testing model architectures.
The lab says Gym is used a lot, and provides:
- an interface to create RL environments
- a collection of environments
This is true.
They visually redescribe that an agent performs actions in an environment, which then returns to them reward and state.
This coupling of reward with environment, rather than the agent which would usually have goals itself, is part of the verbosifying, possibly. Maybe environment is more "environment interface", I'm actually having trouble thinking here. I always get confused around gym environments. Maybe jocks make better programmers nowadays.
Reiteration:
- Agent receives state S0 from the Environment
- Based on S0, agent takes action A0
- Environment has new frame, state S1
- Environment gives reward R1 to the agent.
Steps of using Gym:
- create environment using gym.make()
- reset environment to initial state with observation = env.reset()
At each step:
- get an action using policy model
- using env.step(action), get from the environment: observation (the new state), reward, done (if episode terminatd), info (additional info dict)
If episode is done, the environment is reset to its initial state with observation = env.reset() .
This is very normative openai stuff that looks like it was read off a Gym example from their readme or such.
It's interesting that huggingface is building their own libraries to pair with this course as it progresses. I wonder if some of that normativeness will shift toward increased utility even more.
Here's a retype of the first example code:
import gym
# create environment
env = gym.make('LunarLander-v2')
# reset environment
observation = env.reset()
for _ in range(2+):
# take random action
action = env.action_space.sample()
print('Action taken:', action)
# do action and get next state, reward, etc
observation, reward, done, info = env.step(action)
# if the game is done (land, crash, timeout)
if done:
# reset
print('Environment is reset')
observation = env.reset()