Core concepts

Extended Thinking

A model mode where the LLM spends extra compute reasoning through a problem before answering — trading latency for quality on hard tasks.

01 ——

In plain English

Extended Thinking is a feature of frontier models (Claude 4.x, OpenAI's o-series, Gemini Thinking) where the model generates a long internal chain of reasoning before producing the user-facing answer. The thinking is sometimes shown, sometimes hidden, but always paid for in tokens.

Why it helps: On hard problems — math, coding, multi-step reasoning — letting the model "think out loud" before committing to an answer dramatically improves accuracy. It's the practical application of test-time compute: more inference budget → better outputs on the same model.

Trade-offs:

  • Latency — 5–60+ seconds before the first user-visible token
  • Cost — every thinking token is billed
  • Not always helpful — easy queries don't benefit; some get worse

How products expose it:

  • Claude — "Extended thinking" toggle or budget parameter via the API
  • ChatGPT — model picker (o3, o3-mini, GPT-5 thinking variants)
  • Gemini — "Thinking" model variants
  • Perplexity — "Pro" / "Reasoning" toggles

Most production apps reserve extended thinking for tasks where users explicitly expect a slower, better answer.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI