Re: [ot][spam][crazy] draft: learning RL

9 May 2022


      On Mon, May 9, 2022, 8:12 AM Undiscussed Horrific Abuse, One Victim of Many
<gmkarl@gmail.com> wrote:
...
On Mon, May 9, 2022, 8:05 AM Undiscussed Horrific Abuse, One Victim of
Many <gmkarl@gmail.com> wrote:
...
To represent normal goal behavior with maximization, the
...
...
...
...
>>
>
This is all confused to me, but normally when we meet goals we don't
influence things not related to the goal. This is not usually included in
maximization, unless
...
...
...
...
...
>>
>
return function needs to not only be incredibly complex, but
the return to be maximized were to include them, by maybe always being
1.0, I don't really know.
also feed back to its own evaluation, in a way not
...
...
...
...
...
>>
>
Maybe this relates to not learning habits unrelated to the goal, that
would influence other goals badly.
provided for in these libraries.
...
...
...
...
...
>>
>
But something different is thinking at this time. It is the role of a part
of a mind to try to relate with the other parts. Improving this in a
general way is likely known well to be important.
...
Daydreaming: I'm thinking of how in reality and normality, we have many
many goals going at once (most of them "common sense" and/or "staying being
a living human").  Similarly, I'm thinking of how with normal transformer
models, one trains according to a loss rather than a reward.
I'm considering what if it were more interesting when an agent _fails_ to
meet a goal. Its reward would usually be full, 1.0, but would multiply by
losses when goals are not met.
This seems much nicer to me.
I don't know how RL works since I haven't taken the course, but it looks to
me from a distance like it would just learn at a different (slower) rate
[with other differences]
...
yes
...