[ot][spam][crazy] draft: learning RL

Mon May 9 05:05:24 PDT 2022

>
> To represent normal goal behavior with maximization, the return function
>>>>>> needs to not only be incredibly complex, but also feed back to its own
>>>>>> evaluation, in a way not provided for in these libraries.
>>>>>>
>>>>>
Daydreaming: I'm thinking of how in reality and normality, we have many
many goals going at once (most of them "common sense" and/or "staying being
a living human").  Similarly, I'm thinking of how with normal transformer
models, one trains according to a loss rather than a reward.

I'm considering what if it were more interesting when an agent _fails_ to
meet a goal. Its reward would usually be full, 1.0, but would multiply by
losses when goals are not met.

This seems much nicer to me.

>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 3333 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220509/0533d0da/attachment.txt>