[ot][spam][crazy][data] transformer model 'attention' improvement

Wed Jan 26 05:42:47 PST 2022

On 1/26/22, k <gmkarl at gmail.com> wrote:
> the AminRezaei0x443 implementation also produces the same data, attached
> again.
>
> the aminrezaei implementation does the square root, provides for
> optional mask and bias tensors, is on pypi, and has both a jax and
> torch implementation, so it seems the way to go.
>
> next i'll be timing it compared to the paper's implementation that i
> noted as speedy.  just on my raspberry pi, though.  i'm guessing it's
> roughly the same on good hardware with large models, where the core
> batches dominate everything.  sometimes i mostly engage stuff i bump
> into.
>
> maybe it would be good just to quickly run through the source and
> verify that aminrezai does checkpointing and lax mapping like in the
> paper.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chunked_lib.py
Type: text/x-python
Size: 1361 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220126/3ce00ab7/attachment.py>