On Mon, May 9, 2022, 8:14 AM Undiscussed Horrific Abuse, One Victim of Many <gmkarl@gmail.com> wrote:


On Mon, May 9, 2022, 8:12 AM Undiscussed Horrific Abuse, One Victim of Many <gmkarl@gmail.com> wrote:


On Mon, May 9, 2022, 8:05 AM Undiscussed Horrific Abuse, One Victim of Many <gmkarl@gmail.com> wrote:
To represent normal goal behavior with maximization, the

This is all confused to me, but normally when we meet goals we don't influence things not related to the goal. This is not usually included in maximization, unless

return function needs to not only be incredibly complex, but

the return to be maximized were to include them, by maybe always being 1.0, I don't really know.

also feed back to its own evaluation, in a way not

Maybe this relates to not learning habits unrelated to the goal, that would influence other goals badly.

provided for in these libraries.

But something different is thinking at this time. It is the role of a part of a mind to try to relate with the other parts. Improving this in a general way is likely known well to be important.


Daydreaming: I'm thinking of how in reality and normality, we have many many goals going at once (most of them "common sense" and/or "staying being a living human").  Similarly, I'm thinking of how with normal transformer models, one trains according to a loss rather than a reward.

I'm considering what if it were more interesting when an agent _fails_ to meet a goal. Its reward would usually be full, 1.0, but would multiply by losses when goals are not met.

This seems much nicer to me.

I don't know how RL works since I haven't taken the course, but it looks to me from a distance like it would just learn at a different (slower) rate [with other differences]
> yes
> I think it relates to the other inhibited concept, of value vs action learning. a reward starts at just the event of interest, for example, but the system then learns to apply rewards to things that can relate to the event, like preceding time points [states].