this says it drops parameters by 25x using retrieval: https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokens

if we're stuck copying work we bump into, then it makes sense to figure out what's cited that paper (or bump into something else)