Core concepts

Small Language Model

A compact language model — typically 1B to 15B parameters — designed to run cheaply, fast, or on-device while still being useful for focused tasks.

01 ——

In plain English

A small language model (SLM) is, well, a small one — usually billions instead of hundreds of billions of parameters. SLMs trade absolute capability for cost, latency, and the ability to run locally. As they've gotten dramatically better since 2024, they've become viable for many tasks that used to require a frontier model.

Popular SLM families (2025–26):

  • Microsoft Phi — Phi-3, Phi-4 (3B–14B), trained heavily on synthetic data
  • Google Gemma — 2B, 9B, 27B variants
  • Meta Llama — Llama 3.2 1B/3B, Llama 4 Scout
  • Alibaba Qwen — Qwen 2.5 0.5B through 7B
  • Mistral Small / Ministral family
  • Apple Foundation Model (on-device)

Where SLMs win:

  • On-device — phones, laptops, IoT (no cloud round-trip, no API cost, full privacy)
  • High-volume pipelines — millions of cheap classifications or extractions
  • Latency-critical — voice agents, autocomplete
  • Specialised tasks — fine-tuned SLMs often beat frontier models on the narrow task they were tuned for

The frontier-vs-small gap on routine tasks has shrunk faster than most predicted.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI