Small Language Model
A compact language model — typically 1B to 15B parameters — designed to run cheaply, fast, or on-device while still being useful for focused tasks.
In plain English
A small language model (SLM) is, well, a small one — usually billions instead of hundreds of billions of parameters. SLMs trade absolute capability for cost, latency, and the ability to run locally. As they've gotten dramatically better since 2024, they've become viable for many tasks that used to require a frontier model.
Popular SLM families (2025–26):
- Microsoft Phi — Phi-3, Phi-4 (3B–14B), trained heavily on synthetic data
- Google Gemma — 2B, 9B, 27B variants
- Meta Llama — Llama 3.2 1B/3B, Llama 4 Scout
- Alibaba Qwen — Qwen 2.5 0.5B through 7B
- Mistral Small / Ministral family
- Apple Foundation Model (on-device)
Where SLMs win:
- On-device — phones, laptops, IoT (no cloud round-trip, no API cost, full privacy)
- High-volume pipelines — millions of cheap classifications or extractions
- Latency-critical — voice agents, autocomplete
- Specialised tasks — fine-tuned SLMs often beat frontier models on the narrow task they were tuned for
The frontier-vs-small gap on routine tasks has shrunk faster than most predicted.