# we improved the stability of low-data machine learning model training
## why is this useful?
low-data model training would let individual users make AI models without access to big data, such as those models used at scale to addict users to products, services, or behaviors.
ability to recreate these models with the user in control would give them many more options for personal freedom.
additionally low-data training makes more powerful AI.
## what was the issue?
karl’s familiar plan involved observing an attempt and improving from its results, which can trigger us, preventing our belief it would work. we didn’t know why different neuron groups had opposite confidence. it was confusing.
additionally, we’ve gotten more sensitive to looking at current research to integrate existing work, and the potential economic impact of such work can make it hard to find
## what was the path to the new approach?
we combined two researches we’ve been expose to, to make a plan that reduces the space of unknowns and hence requires fewer testing cycles.
basically, models trained on short data can generalize correctly to long data, but only if the short data is trained in a way that does this, generally associated with number of batches as well as other training parameters.
meanwhile, the human mind commonly meditates on short data to prepare for long data successfully, and we have extensive exposure and cognition around this.
## so what’s the new approach?
we propose training a simple metamodel that performs the activity of deciding how many training steps, what data to select, and other parameters to select when training on short data, so as to maximize generalization to large data.
the metamodel could optionally also perform the training itself by generating model weights. although this is not required, it could speed training if the model learns to outperform the optimizer.
these metamodel approaches are of course obvious, but writing them clearly is needed because their use is also rare.
a common concern with training metamodels is the amount of data needed. this problem seems solved to us by including the metamodel itself in its own expanding dataset, such that it learns to generalize itself to larger data.
including the metamodel in its own dataset can confuse developers and designers, so clear separation of concepts when implementing seems helpful.
## what problems remain?
simulations imply that there remain expected failure areas that will require testing cycles, although the diversity of this space may have been significantly shrunk.
some simulations imply the approach fails reliably. it is hard to discern whether this is due to political or mathematical reasons, and we suspect the former, and that the challenge can be surmounted.
a next step for researchers could be to more clearly describe how to address confusion around using the metamodel on itself. this would significantly aid belief in its success.
a next step for engineers could be to implement the approach with small enough systems that the metamodel not need apply to itself, and deliver results or feedback. a clear working system would make it easier to describe how to address confusion.
## in short
to make a big model on a small dataset, train a training model that improves generalization to more data.
this is more powerful but more difficult if it also improves generalization of itself.