[ot][crazy] Mainstream AI news snippets

Wed May 17 03:33:48 PDT 2023

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
via reddit
https://www.reddit.com/r/MachineLearning/comments/13j0spj/r_tiny_language_models_below_10m_parameters_or/
https://arxiv.org/pdf/2305.07759

Language models (LMs) are powerful tools for natural language
processing, but they often struggle to produce coherent and fluent
text when they are small. Models with around 125M parameters such as
GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and
consistent English text beyond a few words even after extensive
training. This raises the question of whether the emergence of the
ability to produce coherent English text only occurs at larger scales
(with hundreds of millions of parameters or more) and complex
architectures (with many layers of global attention).
In this work, we introduce TinyStories, a synthetic dataset of short
stories that only contain words that a typical 3 to 4-year-olds
usually understand, generated by GPT-3.5 and GPT-4. We show that
TinyStories can be used to train and evaluate LMs that are much
smaller than the state-of-the-art models (below 10 million total
parameters), or have much simpler architectures (with only one
transformer block), yet still produce fluent and consistent stories
with several paragraphs that are diverse and have almost perfect
grammar, and demonstrate reasoning capabilities.