
26 Jan
2022
26 Jan
'22
1:42 p.m.
On 1/26/22, k <gmkarl@gmail.com> wrote:
the AminRezaei0x443 implementation also produces the same data, attached again.
the aminrezaei implementation does the square root, provides for optional mask and bias tensors, is on pypi, and has both a jax and torch implementation, so it seems the way to go.
next i'll be timing it compared to the paper's implementation that i noted as speedy. just on my raspberry pi, though. i'm guessing it's roughly the same on good hardware with large models, where the core batches dominate everything. sometimes i mostly engage stuff i bump into.
maybe it would be good just to quickly run through the source and verify that aminrezai does checkpointing and lax mapping like in the paper.