[ot][spam][crazy] adapters for semibalanced trees?
Undiscussed Horrific Abuse, One Victim of Many
gmkarl at gmail.com
Fri Jul 22 00:36:19 PDT 2022
long story short, from https://arxiv.org/pdf/2112.07916.pdf ,
transient global attention is a new approach to attention invented for
longt5, which appears to reliably outperform local attention in the
same architecture
the appearance of that, combined with finetunings i saw on the
huggingface hub using tglobal attention, seems enough reason to try to
use the tglobal models without fully understanding things in the
moment
More information about the cypherpunks
mailing list