[ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Fri Jun 24 08:37:47 PDT 2022


1129

Summary

The are two types of value-based functions
- State-Value function gives value for every state
- Action-Value function gives value for specific actions leaving
specific states.

There are two methods used to learn a value policy.
- Assuming the return does not rely on the timestep or the path taken,
Monte Carlo approach uses the complete accurate return, but it only
updates from a complete episode
- With TD learning, the value function is updated every step, but it
is estimated as next_reward + discount * old_next_return. (discount is
gamma)

The reading states that it is normal if the parts are still all
confusing, that this is fine. It does say to take time to grasp it
before moving on. I did not include all the terms and equations in my
notes.

There is a link for feedback at https://forms.gle/3HgA7bEHwAmmLfwh9 .

There is a quiz. I remember now there is also a quiz for unit 1. I am
holding the intention of going back and doing quiz 1, which I don't
remember well.


More information about the cypherpunks mailing list