
knowledge distillation is the process of feeding data to the training process of a student neural network in such a way that it adopts the behaviors of a teacher neural network. this process has been useful and there are numerous approaches to it that have been studied. the more powerful approaches involving including training in the system itself, such that more disparate regions of behavior are focused on for more time, or such that the teacher model actually is trained to specifically influence the student model more strongly. this latter approach is called meta-something-or-other and could involve a trial student model used for this training that is then discarded. there seems more to be learned here around something like knowledge distillation.