[ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Fri Jun 24 08:06:35 PDT 2022


1104

- this is the last section of part 1
- there are two ways of learning

Monte Carlo and Temporal Difference Learning are two different
training strategies based on the experiences of the agent.

Monte Carlo uses an entire episode of experiences. Temporal Difference
uses a single state (a quadruple of state, action, reward, next-state)

One of the sentences could imply that these might also apply to
policy-based approaches.


More information about the cypherpunks mailing list