It's getting more normal to use recurrent models that no longer have bounds on their input and output sizes. This removes half the challenge of this task. https://github.com/BlinkDL/RWKV-LM