this seems helpful since transformers are so normal, it makes a transformer that makes other transformers: https://github.com/google-research/google-research/tree/master/hypertransformer other papers have newer things for example generating a kernel-based bayesian model that combines information across tasks and includes uncertainty in its outputs