quote from newer paper:
    In this study we show that training
   stability can be achieved with fewer sacrifices on model effectiveness.
   There are other pathways to improving LLM efficiency. One is to augment
   a language model
   with a retrieval component to fetch external knowledge that is useful
   to perform downstream tasks.
   So, the size of the language model can be significantly reduced since
   it does not need to encode ev-
   erything in model parameters (Guu et al., 2020; Khandelwal et al.,
   2020; Borgeaud et al., 2021; Gui
   et al., 2021; Zhang et al., 2021). With sparse model structures,
   Mixture of Experts models (Artetxe
   et al., 2021; Fedus et al., 2021; Zuo et al., 2021; Zoph et al., 2022)
   adaptively activate a subset of
   model parameters (experts) for different inputs during model training
   and inference. The METRO
   method proposed in this paper is orthogonal to retrieval-augmented
   models and sparsely activated
   models. Their combination is an interesting future work direction.
   This next one is on training a human assistant using feedback and
   reinforcement learning. Nice to have somebody actually talk about that.
   [1]https://arxiv.org/abs/2204.05862

References

   1. https://arxiv.org/abs/2204.05862