Distillation
Training a smaller, cheaper AI model to mimic the outputs of a larger, more capable one — preserving most of the quality at a fraction of the cost.
In plain English
Distillation is a technique where a small "student" model is trained to reproduce the behaviour of a larger "teacher" model. The student learns from the teacher's outputs (or internal probabilities), capturing most of its capability while being far cheaper and faster to run.
Why it matters: Frontier models (GPT-4, Claude Opus, Gemini Ultra) are expensive and slow. Distilled versions like GPT-4o-mini, Claude Haiku, and Gemini Flash deliver 80–95% of the quality at 5–20% of the cost — making AI viable at scale.
Common applications:
- Production deployments that need low latency or high throughput
- On-device AI — running models on phones or laptops
- Open-source models that distil from closed frontier models (a contested practice)
Distillation is one of the main reasons AI tool pricing has dropped so dramatically since 2023.