[ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2

Fri Jun 24 07:56:48 PDT 2022

1052

I reviewed the help desk to get their aid staying on task. They might
need to add something to their FAQ, not sure, or maybe reorder it.

The Bellman Equation simplifies the calculation of state-value and
state-action value.

The examples in this section are simplified, removing discounting of the reward.

Note: It is not too hard to calculate a reward for a state in order to
sum them. The environment provides this information. I may have
confused the terms "reward" and "return" in earlier notes.

The return is the sum of the rewards following the policy.

Bellman Equation: V(st) = R_t+1 + gamma * V(St + 1)

The value of a state is the