11 Sep
2023
11 Sep
'23
4:35 p.m.
so now i’m looking at https://github.com/huggingface/transformers/blob/main/src/transformers/model... the lm-infinite implementation actually sums the token distance to its logit, i think before taking the softmax, and i think the paper talks this way too. this seems quite strange to me, but maybe i have forgotten about it. [content misplaced] maybe the implementation was autogenerated or something ;p