‌
‌

Infra & cost

Latency

The time it takes an AI model to respond to a request — from when you hit send to when the first or final word appears.

01 ——

In plain English

Latency is how long an AI model takes to produce a response. It's measured in two key ways:

Time to first token (TTFT) — how quickly the model starts responding
Total response time — how long until the answer is fully generated

Why latency matters:

User experience — a 5-second delay feels broken; 200ms feels instant
Cost of waiting — agents that take seconds per step compound into minutes
Use case fit — real-time voice needs <500ms; batch summarisation can take longer

What drives latency:

Model size — bigger = slower
Output length — more tokens = more time
Prompt size — long context takes longer to process
Provider load — peak hours have higher latency

Common mitigations: smaller distilled models, streaming responses, prompt caching, and dedicated inference infrastructure.

02 ——

Related terms

The process of running a trained AI model to generate a response — as opposed to training the model.

Sending an AI model's response token-by-token as it's generated, so the user sees text appear immediately instead of waiting for the full reply.

A feature that stores parts of a prompt the model has already processed, making repeat or follow-up requests much faster and cheaper.

Training a smaller, cheaper AI model to mimic the outputs of a larger, more capable one — preserving most of the quality at a fraction of the cost.

An interface that lets developers send requests to an AI model and get responses programmatically — the way most AI tools talk to LLMs.

Back to glossaryLast reviewed June 2026

Vol. 4 · Issue 21 · Last reviewed 2026-06-27

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

AI Tools Directory

The AI tools directory for discovering, exploring, and comparing the most innovative AI tools in the industry

Explore

All AI tools

Top 100 AI tools

Best AI tools

Curated collections

AI tool alternatives

AI categories

Pricing

AI glossary

Compare AI tools

Blog

Methodology

Editorial team

AI graveyard

Research

MCP server

Latest collections

Policy

Terms & conditions

Privacy policy

FAQ

Refund policy

Affiliate disclosure