[ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2

Fri Jun 24 07:41:19 PDT 2022

1038 I am now on the state-value function section at
https://huggingface.co/blog/deep-rl-q-part1#the-state-value-function .

The information bit I missed writing in the last section was that in
value-based methods, the policy is defined by hand, whereas the value
function is modularised as a neural network: in policy-based methods,
the policy itself is the neural network. [limiting hardcoded
heuristics]