On Mon, May 9, 2022, 8:12 AM Undiscussed Horrific Abuse, One Victim of Many <gmkarl@gmail.com> wrote:
On Mon, May 9, 2022, 8:05 AM Undiscussed Horrific Abuse, One Victim of Many <gmkarl@gmail.com> wrote:
To represent normal goal behavior with maximization, the
>> > This is all confused to me, but normally when we meet goals we don't influence things not related to the goal. This is not usually included in maximization, unless
>> >
return function needs to not only be incredibly complex, but the return to be maximized were to include them, by maybe always being 1.0, I don't really know.
also feed back to its own evaluation, in a way not
>> > Maybe this relates to not learning habits unrelated to the goal, that would influence other goals badly.
provided for in these libraries.
>> > But something different is thinking at this time. It is the role of a part of a mind to try to relate with the other parts. Improving this in a general way is likely known well to be important.
Daydreaming: I'm thinking of how in reality and normality, we have many many goals going at once (most of them "common sense" and/or "staying being a living human"). Similarly, I'm thinking of how with normal transformer models, one trains according to a loss rather than a reward.
I'm considering what if it were more interesting when an agent _fails_ to meet a goal. Its reward would usually be full, 1.0, but would multiply by losses when goals are not met.
This seems much nicer to me.
I don't know how RL works since I haven't taken the course, but it looks to me from a distance like it would just learn at a different (slower) rate [with other differences]
yes