Core concepts

Small Language Model

A compact language model — typically 1B to 15B parameters — designed to run cheaply, fast, or on-device while still being useful for focused tasks.

01 ——

In plain English

A small language model (SLM) is, well, a small one — usually billions instead of hundreds of billions of parameters. SLMs trade absolute capability for cost, latency, and the ability to run locally. As they've gotten dramatically better since 2024, they've become viable for many tasks that used to require a frontier model.

Popular SLM families (2025–26):

Microsoft Phi — Phi-3, Phi-4 (3B–14B), trained heavily on synthetic data
Google Gemma — 2B, 9B, 27B variants
Meta Llama — Llama 3.2 1B/3B, Llama 4 Scout
Alibaba Qwen — Qwen 2.5 0.5B through 7B
Mistral Small / Ministral family
Apple Foundation Model (on-device)

Where SLMs win:

On-device — phones, laptops, IoT (no cloud round-trip, no API cost, full privacy)
High-volume pipelines — millions of cheap classifications or extractions
Latency-critical — voice agents, autocomplete
Specialised tasks — fine-tuned SLMs often beat frontier models on the narrow task they were tuned for

The frontier-vs-small gap on routine tasks has shrunk faster than most predicted.

02 ——