12 Sep
2023
12 Sep
'23
1:22 a.m.
i’m looking at lm-infinite a little bit, maybe i can make steps on making it work paper: https://arxiv.org/pdf/2308.16137.pdf partial implementation: https://github.com/kyegomez/LM-Infinite/blob/main/infinite/main.py it seems like the theory is that if out-of-context tokens are moved to the very start of the context window in some empirically-determined way, results on long context outputs radically increase in quality the partial implementation doesn’t include the new calculation of position encodings, which is different depending on the model the length extension is applied to.