[ot][spam] Behavior Log For Compliance Examples: HFRL Unit 2

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Fri Jun 24 08:55:19 PDT 2022


1148 solutions, excluding Q2 where i looked

Q1: What is Reinforcment Learning?

My guess: a strategy for automatically accomplishing tasks by training
policies to select actions from observations of an environment so as
to maximize their reward.

solution: a framework for solving control tasks or decision problems,
by building agents that learn from the environment by interacting with
it through trial and error and receiving rewards (positive or
negative) as unique feedback.
https://huggingface.co/blog/deep-rl-intro#a-formal-definition

Q2: in last message. information at
https://huggingface.co/blog/deep-rl-intro#the-rl-process

Q3: What's the difference between a state and an observation?

guess: The state is the entire situation of the environment. The
observation is what the Agent receives.
guess 2: No difference

solution: The state is a complete description of the state of the
world, without hidden information in a fully observed environment. The
observation is a partial description of the state, in a partially
observed environment.

https://huggingface.co/blog/deep-rl-intro#observationsstates-space

[X]

Q4: A task is an instance of a Reinforcement Learning problem. What
are the two types of tasks?

episodic or continuous

solution: Episodic task: we have a starting point and an ending point.
Continuous task: thesea re tasks that continue forever.

https://huggingface.co/blog/deep-rl-intro#type-of-tasks

Q5: What is the exploration/exploitation tradeoff?

guess: the dilemma an agent faces when decided whether to engage in
purportedly random actions so as to gather more data, or select those
it knows of with the highest return

solution: The need to balance how much we explore the environment and
how much we exploit what we know. Exploring is exploring by trying
random actions to find more information. Exploitation is exploiting
known information to maximize reward.
https://huggingface.co/blog/deep-rl-intro#exploration-exploitation-tradeoff

[X]

Q6: What is a policy?

guess: A function or trained model which selects actions based on state.

solution: Policy Pi is the brain of the Agent. A function that tells
what action to take given the state. It defines the agent's behavior
at a given time.
https://huggingface.co/blog/deep-rl-intro#the-policy-%CF%80-the-agents-brain

Q7: What are value-based methods?

guess: approaches to RL where each state is associated with a value,
and actions are selected to move toward the highest-valued states.

solution: Value-based methods are one of the main approaches. A value
function is trained instead of a policy function; it maps a state to
the expected value of being there.
https://huggingface.co/blog/deep-rl-intro#value-based-methods

Q8: What are policy-based methods?

guess: approaches to RL where actions are selected directly, rather
than making a formal association with state and value

solution: A policy function is learned directly, to map from each
state to the best corresponding action; or a probability distribution
over the set of possible actions at that state.
https://huggingface.co/blog/deep-rl-intro#value-based-methods


More information about the cypherpunks mailing list