[ot][spam][crazy] draft: learning RL

Mon May 9 01:01:40 PDT 2022

>
> "Reward hypothesis" : All goals can be expressed as the maximization of an
> expected return.
>

Note: In my uneducated opinion, this hypothesis is _severely_ false.
Maximization of a return is only a goal, if the goal is already a
maximization of a return. Goals are _parts_ of behavior, whereas
maximization of return guides _all_ behavior around a _single_ value.

To represent normal goal behavior with maximization, the return function
needs to not only be incredibly complex, but also feed back to its own
evaluation, in a way not provided for in these libraries.

This false hypothesis is being actively used to suppress knowledge and use
of these technologies (see: ai alignment) because turning an optimizing
solver into a free agent reliably kills everybody. Nobody would do this
unless they were led to, because humans experience satisfaction, conflicts
produce splash, and optimizing solvers are powerful enough if properly
purposed with contextuality and briefness to resolve the problems of
conflict.

Everybody asks, why do we not have world peace, if we have AI. It is
because we are only using it for the war of optimizing our own private
numbers, at the expense of anybody not involved.

>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 1842 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220509/a9876438/attachment.txt>