‌
‌

Infra & cost

Prompt Caching

A feature that stores parts of a prompt the model has already processed, making repeat or follow-up requests much faster and cheaper.

01 ——

In plain English

Prompt caching is an inference optimisation where the AI provider stores the model's intermediate computation for the long, repeated parts of your prompt (system instructions, large documents, conversation history). On subsequent requests with the same prefix, the model skips re-processing it.

Why it matters:

Lower cost — cached tokens are typically 50–90% cheaper
Faster responses — time to first token can drop dramatically
Practical for RAG and agents — repeated context is the norm

When to use it:

Long system prompts or instructions
Document Q&A where many users ask different questions about the same doc
Agent loops that re-read the same tool definitions or memory
Multi-turn conversations

Supported by: Anthropic, OpenAI, Google, and most major providers, with slightly different APIs and pricing for cached vs fresh tokens.

02 ——

Related terms

The process of running a trained AI model to generate a response — as opposed to training the model.

The time it takes an AI model to respond to a request — from when you hit send to when the first or final word appears.

An interface that lets developers send requests to an AI model and get responses programmatically — the way most AI tools talk to LLMs.

The maximum amount of text (tokens) an AI model can read and remember at once during a single conversation.

Large Language Model — the type of AI behind tools like ChatGPT and Claude, trained to understand and generate text.

Back to glossaryLast reviewed June 2026

Vol. 4 · Issue 21 · Last reviewed 2026-06-27

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

AI Tools Directory

The AI tools directory for discovering, exploring, and comparing the most innovative AI tools in the industry

Explore

All AI tools

Top 100 AI tools

Best AI tools

Curated collections

AI tool alternatives

AI categories

Pricing

AI glossary

Compare AI tools

Blog

Methodology

Editorial team

AI graveyard

Research

MCP server

Latest collections

Policy

Terms & conditions

Privacy policy

FAQ

Refund policy

Affiliate disclosure