[ot][spam][crazy] draft: learning RL

Mon May 9 05:12:24 PDT 2022

On Mon, May 9, 2022, 8:05 AM Undiscussed Horrific Abuse, One Victim of Many
<gmkarl at gmail.com> wrote:

> To represent normal goal behavior with maximization, the
>>>>>>>
>>>>>>
This is all confused to me, but normally when we meet goals we don't
influence things not related to the goal. This is not usually included in
maximization, unless

return function needs to not only be incredibly complex, but
>>>>>>>
>>>>>>
the return to be maximized were to include them, by maybe always being 1.0,
I don't really know.

also feed back to its own evaluation, in a way not
>>>>>>>
>>>>>>
Maybe this relates to not learning habits unrelated to the goal, that would
influence other goals badly.

provided for in these libraries.
>>>>>>>
>>>>>>
But something different is thinking at this time. It is the role of a part
of a mind to try to relate with the other parts. Improving this in a
general way is likely known well to be important.

> Daydreaming: I'm thinking of how in reality and normality, we have many
> many goals going at once (most of them "common sense" and/or "staying being
> a living human").  Similarly, I'm thinking of how with normal transformer
> models, one trains according to a loss rather than a reward.
>
> I'm considering what if it were more interesting when an agent _fails_ to
> meet a goal. Its reward would usually be full, 1.0, but would multiply by
> losses when goals are not met.
>
> This seems much nicer to me.
>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 8639 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220509/6a77367e/attachment.txt>