quote from newer paper: In this study we show that training stability can be achieved with fewer sacrifices on model effectiveness. There are other pathways to improving LLM efficiency. One is to augment a language model with a retrieval component to fetch external knowledge that is useful to perform downstream tasks. So, the size of the language model can be significantly reduced since it does not need to encode ev- erything in model parameters (Guu et al., 2020; Khandelwal et al., 2020; Borgeaud et al., 2021; Gui et al., 2021; Zhang et al., 2021). With sparse model structures, Mixture of Experts models (Artetxe et al., 2021; Fedus et al., 2021; Zuo et al., 2021; Zoph et al., 2022) adaptively activate a subset of model parameters (experts) for different inputs during model training and inference. The METRO method proposed in this paper is orthogonal to retrieval-augmented models and sparsely activated models. Their combination is an interesting future work direction. This next one is on training a human assistant using feedback and reinforcement learning. Nice to have somebody actually talk about that. https://arxiv.org/abs/2204.05862