28 Dec
2022
28 Dec
'22
9:21 p.m.
earlier https://discord.com/channels/823813159592001537/1051442019035254784/10577643... re modern i was thinking things like linear transformer, rwkv, holographic hrrformer, s4d or sgconv, the new “mega” model … something tuned for long context with less ram. hrrformer says it is competitive with only 1 model layer. i’d also use adapters to speed training and provide more accessibility.