[ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2

Fri Jun 24 08:06:35 PDT 2022

1104

- this is the last section of part 1
- there are two ways of learning

Monte Carlo and Temporal Difference Learning are two different
training strategies based on the experiences of the agent.

Monte Carlo uses an entire episode of experiences. Temporal Difference
uses a single state (a quadruple of state, action, reward, next-state)

One of the sentences could imply that these might also apply to
policy-based approaches.