In this study we show that training
stability can be achieved with fewer sacrifices on model effectiveness.
There are other pathways to improving LLM efficiency. One is to augment a language model
with a retrieval component to fetch external knowledge that is useful to perform downstream tasks.
So, the size of the language model can be significantly reduced since it does not need to encode ev-
erything in model parameters (Guu et al., 2020; Khandelwal et al., 2020; Borgeaud et al., 2021; Gui
et al., 2021; Zhang et al., 2021). With sparse model structures, Mixture of Experts models (Artetxe
et al., 2021; Fedus et al., 2021; Zuo et al., 2021; Zoph et al., 2022) adaptively activate a subset of
model parameters (experts) for different inputs during model training and inference. The METRO
method proposed in this paper is orthogonal to retrieval-augmented models and sparsely activated
models. Their combination is an interesting future work direction.
This next one is on training a human assistant using feedback and reinforcement learning. Nice to have somebody actually talk about that.