‌
‌

Core concepts

Extended Thinking

A model mode where the LLM spends extra compute reasoning through a problem before answering — trading latency for quality on hard tasks.

01 ——

In plain English

Extended Thinking is a feature of frontier models (Claude 4.x, OpenAI's o-series, Gemini Thinking) where the model generates a long internal chain of reasoning before producing the user-facing answer. The thinking is sometimes shown, sometimes hidden, but always paid for in tokens.

Why it helps: On hard problems — math, coding, multi-step reasoning — letting the model "think out loud" before committing to an answer dramatically improves accuracy. It's the practical application of test-time compute: more inference budget → better outputs on the same model.

Trade-offs:

Latency — 5–60+ seconds before the first user-visible token
Cost — every thinking token is billed
Not always helpful — easy queries don't benefit; some get worse

How products expose it:

Claude — "Extended thinking" toggle or budget parameter via the API
ChatGPT — model picker (o3, o3-mini, GPT-5 thinking variants)
Gemini — "Thinking" model variants
Perplexity — "Pro" / "Reasoning" toggles

Most production apps reserve extended thinking for tasks where users explicitly expect a slower, better answer.

02 ——

Related terms

Reasoning Model

A model variant trained or tuned to spend more compute on internal reasoning before answering — better on math, code, and multi-step problems.

Test-time Compute

The amount of compute spent at inference time on a single response — increased dramatically by reasoning models to improve quality.

Chain of Thought

A prompting technique where you ask the AI to "think step by step" before giving an answer — usually leading to better reasoning.

The process of running a trained AI model to generate a response — as opposed to training the model.

The time it takes an AI model to respond to a request — from when you hit send to when the first or final word appears.

The basic units of text that AI models read and write — roughly ¾ of a word each. Models are priced and limited by token count.

Back to glossaryLast reviewed June 2026

Vol. 4 · Issue 21 · Last reviewed 2026-06-27

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

AI Tools Directory

The AI tools directory for discovering, exploring, and comparing the most innovative AI tools in the industry

Explore

All AI tools

Top 100 AI tools

Best AI tools

Curated collections

AI tool alternatives

AI categories

Pricing

AI glossary

Compare AI tools

Blog

Methodology

Editorial team

AI graveyard

Research

MCP server

Latest collections

Policy

Terms & conditions

Privacy policy

FAQ

Refund policy

Affiliate disclosure