[ot][spam][crazy] adapters for semibalanced trees?

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Fri Jul 22 00:36:19 PDT 2022


long story short, from https://arxiv.org/pdf/2112.07916.pdf ,
transient global attention is a new approach to attention invented for
longt5, which appears to reliably outperform local attention in the
same architecture

the appearance of that, combined with finetunings i saw on the
huggingface hub using tglobal attention, seems enough reason to try to
use the tglobal models without fully understanding things in the
moment


More information about the cypherpunks mailing list