Re: [ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2

24 Jun 2022

      1052

I reviewed the help desk to get their aid staying on task. They might
need to add something to their FAQ, not sure, or maybe reorder it.

The Bellman Equation simplifies the calculation of state-value and
state-action value.

The examples in this section are simplified, removing discounting of the reward.

Note: It is not too hard to calculate a reward for a state in order to
sum them. The environment provides this information. I may have
confused the terms "reward" and "return" in earlier notes.

The return is the sum of the rewards following the policy.

Bellman Equation: V(st) = R_t+1 + gamma * V(St + 1)

The value of a state is the

Re: [ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2

Undiscussed Horrific Abuse, One Victim of Many