It's getting more normal to use recurrent models that no longer have bounds on their input and output sizes. This removes half the challenge of this task. [1]https://github.com/BlinkDL/RWKV-LM References 1. https://github.com/BlinkDL/RWKV-LM