the lab says huggingface's model hub, which I mostly use a remote server to store pretrained language models and send data to the to my goverment on when and where I use them, now has deep reinforcement learning models available at https://huggingface.co/models?pipeline_tag=reinforcement-learning&sort=downloads

Here's the import code, retyped:

import gym

from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub

from huggingface_hub import notebook_login # for uploading to account from notebook

from stable_baselines3 import PPO

from stable_baselines3.common.evaluation import evaluate_policy

from stable_baselines3.common.env_util import make_vec_env

Of course, uploading to the hub is possibly a very bad idea unless you are an experienced activist or researcher or spy, or have something important to share with your government or huggingface, or are only doing this casually and might get a job in it one day.

The lab then provides an intro to Gym, which is a python library that openai made that has the effect of making it hard to tech technologies out of research, in the opinion of my pessimistic half, by verbosifying the construction of useful environments under an assumption they are only for testing model architectures.

The lab says Gym is used a lot, and provides:

- an interface to create RL environments

- a collection of environments

This is true.

They visually redescribe that an agent performs actions in an environment, which then returns to them reward and state.

This coupling of reward with environment, rather than the agent which would usually have goals itself, is part of the verbosifying, possibly. Maybe environment is more "environment interface", I'm actually having trouble thinking here. I always get confused around gym environments. Maybe jocks make better programmers nowadays.

Reiteration:

- Agent receives state S0 from the Environment

- Based on S0, agent takes action A0

- Environment has new frame, state S1

- Environment gives reward R1 to the agent.

Steps of using Gym:

- create environment using gym.make()

- reset environment to initial state with observation = env.reset()

At each step:

- get an action using policy model

- using env.step(action), get from the environment: observation (the new state), reward, done (if episode terminatd), info (additional info dict)

If episode is done, the environment is reset to its initial state with observation = env.reset() .

This is very normative openai stuff that looks like it was read off a Gym example from their readme or such.

It's interesting that huggingface is building their own libraries to pair with this course as it progresses. I wonder if some of that normativeness will shift toward increased utility even more.

Here's a retype of the first example code:

import gym

# create environment

env = gym.make('LunarLander-v2')

# reset environment

observation = env.reset()

for _ in range(2+):

# take random action

action = env.action_space.sample()

print('Action taken:', action)

# do action and get next state, reward, etc

observation, reward, done, info = env.step(action)

# if the game is done (land, crash, timeout)

if done:

# reset

print('Environment is reset')

observation = env.reset()