1129 Summary The are two types of value-based functions - State-Value function gives value for every state - Action-Value function gives value for specific actions leaving specific states. There are two methods used to learn a value policy. - Assuming the return does not rely on the timestep or the path taken, Monte Carlo approach uses the complete accurate return, but it only updates from a complete episode - With TD learning, the value function is updated every step, but it is estimated as next_reward + discount * old_next_return. (discount is gamma) The reading states that it is normal if the parts are still all confusing, that this is fine. It does say to take time to grasp it before moving on. I did not include all the terms and equations in my notes. There is a link for feedback at https://forms.gle/3HgA7bEHwAmmLfwh9 . There is a quiz. I remember now there is also a quiz for unit 1. I am holding the intention of going back and doing quiz 1, which I don't remember well.