‌
‌

Infra & cost

Quantization

Shrinking an AI model by storing its weights in lower-precision numbers — making it smaller, faster, and cheaper with minimal quality loss.

01 ——

In plain English

Quantization compresses an AI model by reducing the precision of its numbers — for example, converting 32-bit floating-point weights to 8-bit or 4-bit integers. The model becomes much smaller and faster to run, usually with only a small drop in quality.

Why it matters:

Run on smaller hardware — quantized models can fit on a laptop, phone, or single GPU
Lower inference cost — faster math = more requests per second per chip
Edge AI — quantization is essential for on-device models like Apple Intelligence

Common levels:

FP16 / BF16 — half-precision, near-original quality
INT8 — usually negligible quality loss
INT4 — significant compression, small but noticeable quality drop

Most open-weight models you can download (Llama, Mistral, Qwen) ship in multiple quantized variants so you can pick the size that fits your hardware.

02 ——

Related terms

Training a smaller, cheaper AI model to mimic the outputs of a larger, more capable one — preserving most of the quality at a fraction of the cost.

Further training a pre-trained AI model on your own data to specialise it for a specific task or style.

Low-Rank Adaptation — a cheap way to fine-tune large AI models by training a small set of extra weights instead of the whole model.

The process of running a trained AI model to generate a response — as opposed to training the model.

Open-weight Model

An AI model whose trained weights are publicly released, so anyone can download, run, or fine-tune it themselves.

Back to glossaryLast reviewed June 2026

Vol. 4 · Issue 21 · Last reviewed 2026-06-27

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

Discover, explore, and compare the most innovative AI tools in the industry

Explore

All AI tools

Top 100 AI tools

Best AI tools

Curated collections

AI tool alternatives

AI categories

Pricing

AI glossary

Compare AI tools

Blog

Methodology

Editorial team

AI graveyard

Research

MCP server

Latest collections

Policy

Terms & conditions

Privacy policy

FAQ

Refund policy

Affiliate disclosure