Re: [ot][spam][crazy] draft: learning RL

9 May 2022

      ...
To represent normal goal behavior with maximization, the return function
...
...
...
...
...
needs to not only be incredibly complex, but also feed back to its own
evaluation, in a way not provided for in these libraries.
Daydreaming: I'm thinking of how in reality and normality, we have many
many goals going at once (most of them "common sense" and/or "staying being
a living human").  Similarly, I'm thinking of how with normal transformer
models, one trains according to a loss rather than a reward.

I'm considering what if it were more interesting when an agent _fails_ to
meet a goal. Its reward would usually be full, 1.0, but would multiply by
losses when goals are not met.

This seems much nicer to me.
...

Re: [ot][spam][crazy] draft: learning RL

Undiscussed Horrific Abuse, One Victim of Many