1148 solutions, excluding Q2 where i looked Q1: What is Reinforcment Learning? My guess: a strategy for automatically accomplishing tasks by training policies to select actions from observations of an environment so as to maximize their reward. solution: a framework for solving control tasks or decision problems, by building agents that learn from the environment by interacting with it through trial and error and receiving rewards (positive or negative) as unique feedback. https://huggingface.co/blog/deep-rl-intro#a-formal-definition Q2: in last message. information at https://huggingface.co/blog/deep-rl-intro#the-rl-process Q3: What's the difference between a state and an observation? guess: The state is the entire situation of the environment. The observation is what the Agent receives. guess 2: No difference solution: The state is a complete description of the state of the world, without hidden information in a fully observed environment. The observation is a partial description of the state, in a partially observed environment. https://huggingface.co/blog/deep-rl-intro#observationsstates-space [X] Q4: A task is an instance of a Reinforcement Learning problem. What are the two types of tasks? episodic or continuous solution: Episodic task: we have a starting point and an ending point. Continuous task: thesea re tasks that continue forever. https://huggingface.co/blog/deep-rl-intro#type-of-tasks Q5: What is the exploration/exploitation tradeoff? guess: the dilemma an agent faces when decided whether to engage in purportedly random actions so as to gather more data, or select those it knows of with the highest return solution: The need to balance how much we explore the environment and how much we exploit what we know. Exploring is exploring by trying random actions to find more information. Exploitation is exploiting known information to maximize reward. https://huggingface.co/blog/deep-rl-intro#exploration-exploitation-tradeoff [X] Q6: What is a policy? guess: A function or trained model which selects actions based on state. solution: Policy Pi is the brain of the Agent. A function that tells what action to take given the state. It defines the agent's behavior at a given time. https://huggingface.co/blog/deep-rl-intro#the-policy-%CF%80-the-agents-brain Q7: What are value-based methods? guess: approaches to RL where each state is associated with a value, and actions are selected to move toward the highest-valued states. solution: Value-based methods are one of the main approaches. A value function is trained instead of a policy function; it maps a state to the expected value of being there. https://huggingface.co/blog/deep-rl-intro#value-based-methods Q8: What are policy-based methods? guess: approaches to RL where actions are selected directly, rather than making a formal association with state and value solution: A policy function is learned directly, to map from each state to the best corresponding action; or a probability distribution over the set of possible actions at that state. https://huggingface.co/blog/deep-rl-intro#value-based-methods