![](https://secure.gravatar.com/avatar/86577f45d5fb4051824c3df598d4157d.jpg?s=120&d=mm&r=g)
28 Dec
2022
28 Dec
'22
9:21 p.m.
earlier https://discord.com/channels/823813159592001537/1051442019035254784/10577643... re modern i was thinking things like linear transformer, rwkv, holographic hrrformer, s4d or sgconv, the new “mega” model … something tuned for long context with less ram. hrrformer says it is competitive with only 1 model layer. i’d also use adapters to speed training and provide more accessibility.