Knowledge Distillation

Published:

Knowledge distillation is a model compression method where a large, accurate “teacher” model trains a smaller “student” model. Instead of learning only from the correct labels, the student also learns from the teacher’s softer outputs, such as probability scores that show how similar different classes are. These richer signals help the student model mimic the teacher’s behavior more closely while using far fewer parameters and much less compute.

Teams often combine distillation with pruning or quantization to shrink models even further. A successful distilled model keeps accuracy and calibration close to the teacher while offering major improvements in size, speed, and energy use. Because of this, knowledge distillation has become a key technique for running advanced AI on phones, edge devices, and other hardware with limited resources.

Follow us on Facebook and LinkedIn to keep abreast of our latest news and articles