Re: [ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2

24 Jun 2022

      1104

- this is the last section of part 1
- there are two ways of learning

Monte Carlo and Temporal Difference Learning are two different
training strategies based on the experiences of the agent.

Monte Carlo uses an entire episode of experiences. Temporal Difference
uses a single state (a quadruple of state, action, reward, next-state)

One of the sentences could imply that these might also apply to
policy-based approaches.

Undiscussed Horrific Abuse, One Victim of Many