[ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2
Undiscussed Horrific Abuse, One Victim of Many
gmkarl at gmail.com
Fri Jun 24 07:44:33 PDT 2022
1041
State value function
V_Pi(s) = E_Pi[G_t|S_t = s]
The policy value of state s is the expected policy return if the agent
starts at state s.
The equation doesn't seem very helfpul.
Second description of equation:
For each state, the state-value fucntion outputs the expected return,
if the agent starts in that state, and then follows the policy forever
after.
A graphic is shown of a mouse in a tiny maze finding cheese. Each
square has a number representing the negative number of steps needed
to reach the cheese. This step is the value.
At this point, it is pretty easy to imagine a function recursively
updating the values of every square in such a maze until they
stabilise, to arrive at the picture.
More information about the cypherpunks
mailing list