[ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2
Undiscussed Horrific Abuse, One Victim of Many
gmkarl at gmail.com
Fri Jun 24 08:55:19 PDT 2022
1148 solutions, excluding Q2 where i looked
Q1: What is Reinforcment Learning?
My guess: a strategy for automatically accomplishing tasks by training
policies to select actions from observations of an environment so as
to maximize their reward.
solution: a framework for solving control tasks or decision problems,
by building agents that learn from the environment by interacting with
it through trial and error and receiving rewards (positive or
negative) as unique feedback.
https://huggingface.co/blog/deep-rl-intro#a-formal-definition
Q2: in last message. information at
https://huggingface.co/blog/deep-rl-intro#the-rl-process
Q3: What's the difference between a state and an observation?
guess: The state is the entire situation of the environment. The
observation is what the Agent receives.
guess 2: No difference
solution: The state is a complete description of the state of the
world, without hidden information in a fully observed environment. The
observation is a partial description of the state, in a partially
observed environment.
https://huggingface.co/blog/deep-rl-intro#observationsstates-space
[X]
Q4: A task is an instance of a Reinforcement Learning problem. What
are the two types of tasks?
episodic or continuous
solution: Episodic task: we have a starting point and an ending point.
Continuous task: thesea re tasks that continue forever.
https://huggingface.co/blog/deep-rl-intro#type-of-tasks
Q5: What is the exploration/exploitation tradeoff?
guess: the dilemma an agent faces when decided whether to engage in
purportedly random actions so as to gather more data, or select those
it knows of with the highest return
solution: The need to balance how much we explore the environment and
how much we exploit what we know. Exploring is exploring by trying
random actions to find more information. Exploitation is exploiting
known information to maximize reward.
https://huggingface.co/blog/deep-rl-intro#exploration-exploitation-tradeoff
[X]
Q6: What is a policy?
guess: A function or trained model which selects actions based on state.
solution: Policy Pi is the brain of the Agent. A function that tells
what action to take given the state. It defines the agent's behavior
at a given time.
https://huggingface.co/blog/deep-rl-intro#the-policy-%CF%80-the-agents-brain
Q7: What are value-based methods?
guess: approaches to RL where each state is associated with a value,
and actions are selected to move toward the highest-valued states.
solution: Value-based methods are one of the main approaches. A value
function is trained instead of a policy function; it maps a state to
the expected value of being there.
https://huggingface.co/blog/deep-rl-intro#value-based-methods
Q8: What are policy-based methods?
guess: approaches to RL where actions are selected directly, rather
than making a formal association with state and value
solution: A policy function is learned directly, to map from each
state to the best corresponding action; or a probability distribution
over the set of possible actions at that state.
https://huggingface.co/blog/deep-rl-intro#value-based-methods
More information about the cypherpunks
mailing list