[ot][spam][crazy] draft: learning RL

Mon May 9 01:40:03 PDT 2022

On Mon, May 9, 2022, 4:38 AM Undiscussed Horrific Abuse, One Victim of Many
<gmkarl at gmail.com> wrote:

>
>
> On Mon, May 9, 2022, 4:22 AM Undiscussed Horrific Abuse, One Victim of
> Many <gmkarl at gmail.com> wrote:
>
>> To represent normal goal behavior with maximization, the return function
>>> needs to not only be incredibly complex, but also feed back to its own
>>> evaluation, in a way not provided for in these libraries.
>>>
>>
>> It should have anything inside the policy that can change as part of its
>> environment state.
>>
>> This is so important that even if it doesn't help it should be done,
>> because it's so important to observe before action, in all situations.
>>
>
> There is unexpected conflict around this combined expression of more
> useful processes, and safer observation before influence. I believe this is
> important (if acontextual), and wrong only in ways that are smaller than
> the eventual problems it reduces, but I understand that my perception is
> incorrect in some way.
>

I am hearing/guessing that the problem is that the information is designed
for human consumption rather than automated consumption, and the harm is
significantly increased when automated consumption happens before human
consumption.

>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 2971 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220509/f6849bfc/attachment.txt>