[ot][spam][crazy][data] transformer model 'attention' improvement

k gmkarl at gmail.com
Wed Jan 26 01:29:10 PST 2022


During making these small tests I discovered that the paper's specific
jax implementation is very fast.  Noticeably faster than my example
that didn't use jax.lax, on my raspberry pi cpu.  Huggingface's
perceiver model implementation is not in jax; rather, it's in torch.

There are implementations of this memory efficient attention paper in
pytorch, at https://github.com/moskomule/memory_efficient_attention.pytorch
and https://github.com/AminRezaei0x443/memory-efficient-attention .

I'd like to try them both out on the same example data [might only try
one out though].


More information about the cypherpunks mailing list