On Mon, May 9, 2022, 4:22 AM Undiscussed Horrific Abuse, One Victim of
   Many <[1]gmkarl@gmail.com> wrote:

   To represent normal goal behavior with maximization, the return
   function needs to not only be incredibly complex, but also feed back to
   its own evaluation, in a way not provided for in these libraries.

   It should have anything inside the policy that can change as part of
   its environment state.
   This is so important that even if it doesn't help it should be done,
   because it's so important to observe before action, in all situations.

   There is unexpected conflict around this combined expression of more
   useful processes, and safer observation before influence. I believe
   this is important (if acontextual), and wrong only in ways that are
   smaller than the eventual problems it reduces, but I understand that my
   perception is incorrect in some way.

References

   1. mailto:gmkarl@gmail.com