Infra & cost

Inference

The process of running a trained AI model to generate a response — as opposed to training the model.

01 ——

In plain English

Inference is what happens when you actually use an AI model. You send it a prompt, the model processes it through billions of parameters, and it outputs a response. Every time you chat with an AI, that's inference.

Training vs inference:

  • Training = teaching the model (done once, very expensive, requires massive compute)
  • Inference = using the model (done billions of times per day, the ongoing cost)

Why it matters for AI tools: Inference cost is the main operating expense for AI products. Faster, cheaper inference (via model compression, batching, or specialised hardware) is what makes AI tools affordable at scale. When a provider mentions "tokens per second" or "latency", they're talking about inference performance.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI