[ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2

Fri Jun 24 08:48:20 PDT 2022

1139

Q1: What is Reinforcment Learning?

My guess: a strategy for automatically accomplishing tasks by training
policies to select actions from observations of an environment so as
to maximize their reward.

Q2: Define the RL Loop

- Our Agent receives ____ from the environment. guess: an observation
- Based on that ____ the Agent takes an _____ guess: observation, action
- Our Agent will move to the right
- The Environment goes to a _____ guess: new state
- The Environment gives ____ to the Agent  guess: reward

Solution:

- Our Agent receives state s0 from the environment
- Based on that state s0 the Agent takes an action a0
- Our Agent will move to the right
- The Environment goes to a new state s1
- The Environment gives a reward r1 to the Agent

Noting: the reward is associated with the new state the environment
moved to. There is no r0.

Q3: What's the difference between a state and an observation?

guess: The state is the entire situation of the environment. The
observation is what the Agent receives.
guess 2: No difference

Q4: A task is an instance of a Reinforcement Learning problem. What
are the two types of tasks?

I don't know this one. I guess looking it up would be a helpful behavior.

I looked it up. They can be episodic or continuous. An episodic task
has a start state and terminal state. A continuous task is ongoing
without bounds.

Q5: What is the exploration/exploitation tradeoff?

guess: the dilemma an agent faces when decided whether to engage in
purportedly random actions so as to gather more data, or select those
it knows of with the highest return

Q6: What is a policy?

guess: A function or trained model which selects actions based on state.

Q7: What are value-based methods?

guess: approaches to RL where each state is associated with a value,
and actions are selected to move toward the highest-valued states.

Q8: What are policy-based methods?

guess: approaches to RL where actions are selected directly, rather
than making a formal association with state and value