Training

Distillation

Training a smaller, cheaper AI model to mimic the outputs of a larger, more capable one — preserving most of the quality at a fraction of the cost.

01 ——

In plain English

Distillation is a technique where a small "student" model is trained to reproduce the behaviour of a larger "teacher" model. The student learns from the teacher's outputs (or internal probabilities), capturing most of its capability while being far cheaper and faster to run.

Why it matters: Frontier models (GPT-4, Claude Opus, Gemini Ultra) are expensive and slow. Distilled versions like GPT-4o-mini, Claude Haiku, and Gemini Flash deliver 80–95% of the quality at 5–20% of the cost — making AI viable at scale.

Common applications:

  • Production deployments that need low latency or high throughput
  • On-device AI — running models on phones or laptops
  • Open-source models that distil from closed frontier models (a contested practice)

Distillation is one of the main reasons AI tool pricing has dropped so dramatically since 2023.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI