On Mon, May 9, 2022, 4:40 AM Undiscussed Horrific Abuse, One Victim of Many <gmkarl@gmail.com> wrote:

On Mon, May 9, 2022, 4:38 AM Undiscussed Horrific Abuse, One Victim of Many <gmkarl@gmail.com> wrote:

On Mon, May 9, 2022, 4:22 AM Undiscussed Horrific Abuse, One Victim of Many <gmkarl@gmail.com> wrote:
To represent normal goal behavior with maximization, the return function needs to not only be incredibly complex, but also feed back to its own evaluation, in a way not provided for in these libraries.

It should have anything inside the policy that can change as part of its environment state.

There is censorship here: many important parts of the idea are left out, focusing only on one projection of error.

The concern is a severe norm of action prior to observation, a habit known to cause severe errors, regardless of training and practice.

This is so important that even if it doesn't help it should be done, because it's so important to observe before action, in all situations.

There is unexpected conflict around this combined expression of more useful processes, and safer observation before influence. I believe this is important (if acontextual), and wrong only in ways that are smaller than the eventual problems it reduces, but I understand that my perception is incorrect in some way.

I am hearing/guessing that the problem is that the information is designed for human consumption rather than automated consumption, and the harm is significantly increased when automated consumption happens before human consumption.