[ot][spam][crazy][log] transformer?

Wed Oct 5 17:44:00 PDT 2022

https://discord.gg/797Hb2HR
https://discord.com/channels/886589909299777576/987686110777966592/1027372000487952424
baffo32 2022-10-06 00:09 UTC
There’s been a paper quietly kicking around that speeds up model
training by up to 370x, flattens architectures to a single layer,
drops memory requirements by 10x, and effectively has long context:
https://kdd-milets.github.io/milets2022/papers/MILETS_2022_paper_5942.pdf
. It’s for NeurIPS 2022, upcoming.
I’ve been kicking it around in my mind a smidge, and I’m thinking it
would just be so useful and likely appreciated for anybody at all to
make any implementation at all of this paper, that it would be worth
trying to do.
The concept of the paper is not complex, but it’s quite hard for me to
approach, as usual.