[ot][spam][crazy][log] transformer?

Undescribed Horrific Abuse, One Victim & Survivor of Many gmkarl at gmail.com
Wed Oct 5 17:44:00 PDT 2022

baffo32 2022-10-06 00:09 UTC
There’s been a paper quietly kicking around that speeds up model
training by up to 370x, flattens architectures to a single layer,
drops memory requirements by 10x, and effectively has long context:
. It’s for NeurIPS 2022, upcoming.
I’ve been kicking it around in my mind a smidge, and I’m thinking it
would just be so useful and likely appreciated for anybody at all to
make any implementation at all of this paper, that it would be worth
trying to do.
The concept of the paper is not complex, but it’s quite hard for me to
approach, as usual.

More information about the cypherpunks mailing list