quote from newer paper:

In this study we show that training

stability can be achieved with fewer sacrifices on model effectiveness.

There are other pathways to improving LLM efficiency. One is to augment a language model

with a retrieval component to fetch external knowledge that is useful to perform downstream tasks.

So, the size of the language model can be significantly reduced since it does not need to encode ev-

erything in model parameters (Guu et al., 2020; Khandelwal et al., 2020; Borgeaud et al., 2021; Gui

et al., 2021; Zhang et al., 2021). With sparse model structures, Mixture of Experts models (Artetxe

et al., 2021; Fedus et al., 2021; Zuo et al., 2021; Zoph et al., 2022) adaptively activate a subset of

model parameters (experts) for different inputs during model training and inference. The METRO

method proposed in this paper is orthogonal to retrieval-augmented models and sparsely activated

models. Their combination is an interesting future work direction.

This next one is on training a human assistant using feedback and reinforcement learning. Nice to have somebody actually talk about that.