‌
‌

Infra & cost

Inference

The process of running a trained AI model to generate a response — as opposed to training the model.

01 ——

In plain English

Inference is what happens when you actually use an AI model. You send it a prompt, the model processes it through billions of parameters, and it outputs a response. Every time you chat with an AI, that's inference.

Training vs inference:

Training = teaching the model (done once, very expensive, requires massive compute)
Inference = using the model (done billions of times per day, the ongoing cost)

Why it matters for AI tools: Inference cost is the main operating expense for AI products. Faster, cheaper inference (via model compression, batching, or specialised hardware) is what makes AI tools affordable at scale. When a provider mentions "tokens per second" or "latency", they're talking about inference performance.

02 ——

Related terms

Large Language Model — the type of AI behind tools like ChatGPT and Claude, trained to understand and generate text.

The basic units of text that AI models read and write — roughly ¾ of a word each. Models are priced and limited by token count.

The time it takes an AI model to respond to a request — from when you hit send to when the first or final word appears.

Sending an AI model's response token-by-token as it's generated, so the user sees text appear immediately instead of waiting for the full reply.

Shrinking an AI model by storing its weights in lower-precision numbers — making it smaller, faster, and cheaper with minimal quality loss.

Back to glossaryLast reviewed June 2026

Vol. 4 · Issue 21 · Last reviewed 2026-06-27

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

AI Tools Directory

The AI tools directory for discovering, exploring, and comparing the most innovative AI tools in the industry

Explore

All AI tools

Top 100 AI tools

Best AI tools

Curated collections

AI tool alternatives

AI categories

Pricing

AI glossary

Compare AI tools

Blog

Methodology

Editorial team

AI graveyard

Research

MCP server

Latest collections

Policy

Terms & conditions

Privacy policy

FAQ

Refund policy

Affiliate disclosure