1139 Q1: What is Reinforcment Learning? My guess: a strategy for automatically accomplishing tasks by training policies to select actions from observations of an environment so as to maximize their reward. Q2: Define the RL Loop - Our Agent receives ____ from the environment. guess: an observation - Based on that ____ the Agent takes an _____ guess: observation, action - Our Agent will move to the right - The Environment goes to a _____ guess: new state - The Environment gives ____ to the Agent guess: reward Solution: - Our Agent receives state s0 from the environment - Based on that state s0 the Agent takes an action a0 - Our Agent will move to the right - The Environment goes to a new state s1 - The Environment gives a reward r1 to the Agent Noting: the reward is associated with the new state the environment moved to. There is no r0. Q3: What's the difference between a state and an observation? guess: The state is the entire situation of the environment. The observation is what the Agent receives. guess 2: No difference Q4: A task is an instance of a Reinforcement Learning problem. What are the two types of tasks? I don't know this one. I guess looking it up would be a helpful behavior. I looked it up. They can be episodic or continuous. An episodic task has a start state and terminal state. A continuous task is ongoing without bounds. Q5: What is the exploration/exploitation tradeoff? guess: the dilemma an agent faces when decided whether to engage in purportedly random actions so as to gather more data, or select those it knows of with the highest return Q6: What is a policy? guess: A function or trained model which selects actions based on state. Q7: What are value-based methods? guess: approaches to RL where each state is associated with a value, and actions are selected to move toward the highest-valued states. Q8: What are policy-based methods? guess: approaches to RL where actions are selected directly, rather than making a formal association with state and value