[ot][spam][crazy][data] transformer model 'attention' improvement

Sun Jan 30 06:19:48 PST 2022

[I rebooted x but it didn't boot back up and i'm sending this from a phone.]

The next step for me here is to form logical verification that the
implementation of the feature completely counters the memory savings. It
seems likely I can do this given the rest of these threads, but if I don't
it makes sense to basically assume the verification is likely true, and
move forward by closing the issues and pull request (they could be replaced
by one noting the reason for the "feature") and changing transformers to
disable attention output if chunking is engaged.

Commented on the PR:

>
I've converted this to a draft because I think this "feature" may be
ridiculous, making the memory usage O(n^2) again by retaining weights
across chunks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 1118 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220130/33f3cbfe/attachment.txt>