2 Feb
2022
2 Feb
'22
10:54 a.m.
- gptj uses a pregenerated constant causal mask that is O(n^2). since it is simply a constant function of sequence index it could be made via a callback or inside a loop.