Training

Distillation

Training a smaller, cheaper AI model to mimic the outputs of a larger, more capable one — preserving most of the quality at a fraction of the cost.

01 ——

In plain English

Distillation is a technique where a small "student" model is trained to reproduce the behaviour of a larger "teacher" model. The student learns from the teacher's outputs (or internal probabilities), capturing most of its capability while being far cheaper and faster to run.

Why it matters: Frontier models (GPT-4, Claude Opus, Gemini Ultra) are expensive and slow. Distilled versions like GPT-4o-mini, Claude Haiku, and Gemini Flash deliver 80–95% of the quality at 5–20% of the cost — making AI viable at scale.

Common applications:

Production deployments that need low latency or high throughput
On-device AI — running models on phones or laptops
Open-source models that distil from closed frontier models (a contested practice)

Distillation is one of the main reasons AI tool pricing has dropped so dramatically since 2023.

02 ——