Anthropic ships Deep Thinking mode for Claude 4.7, sets the bar for reasoning compute
Claude Pro users now get a toggle that lets the model spend up to 30 minutes on a single query. We tested it against GPT-5 and Gemini 3 Pro on hard math, code, and policy questions. The results were uneven, and the bill was not.
The short version. Anthropic has rolled out Deep Thinking, a new high-compute reasoning mode in Claude 4.7, to all Pro and Max subscribers. It is the company's clearest answer yet to OpenAI's o-series and Google's Gemini 3 Pro Thinking. On three benchmarks we ran, Claude won one, lost one, and tied one. On price, it remains the most expensive of the three to operate at scale.
What changed
Until last week, Claude users could nudge the model into longer reasoning by asking it to "think step by step" or to use extended thinking via the API. Deep Thinking is different. It is a toggle in the Claude.ai sidebar that allocates a budget of up to thirty minutes of compute to a single response, and visibly shows the model deliberating in real time.
According to Anthropic's release notes, Deep Thinking uses the same underlying Claude 4.7 weights as standard mode. The difference is in how the inference loop is structured: more rollouts, more verification passes, and a stronger self-critique step before the final answer is emitted.
How it stacks up
We ran three workloads against Claude 4.7 Deep Thinking, GPT-5 with maximum reasoning effort, and Gemini 3 Pro Thinking.
- Hard math. A 2026 Putnam-style problem on integer sequences. Claude solved it in eight minutes. GPT-5 solved it in three. Gemini 3 Pro solved it in twelve but with a cleaner proof.
- Refactoring. A 4,000-line TypeScript module with a known concurrency bug. Claude found and fixed the bug in eleven minutes. GPT-5 found it but introduced a regression. Gemini 3 Pro did not converge.
- Policy analysis. A 60-page draft of the proposed Pakistan Personal Data Protection Bill. Claude produced the most defensible summary, with footnoted citations to specific clauses. GPT-5 hallucinated one clause number. Gemini 3 Pro produced the most readable prose but missed two material provisions.
What it costs
Deep Thinking is included in the $20/month Pro tier and the $200/month Max tier, but with strict limits. Heavy use will push most builders to the API, where reasoning tokens are billed at a multiple of standard output tokens. Our estimate, based on Anthropic's published pricing, is that a serious Deep Thinking workload runs three to five times more expensive than the same query on Claude Sonnet 4.6.
Use our AI Cost Calculator to see what this looks like for your own monthly token volume.
The Pakistan angle
The bigger story for builders in Pakistan is not the benchmark numbers. It is that all three frontier reasoning models are now consumable from Karachi, Lahore, and Islamabad without a VPN, after Anthropic enabled direct payments via VisaNet and Mastercard's Pakistan corridor in March. For the first time, a developer in Pakistan can buy frontier reasoning compute at the same price as one in San Francisco. The economic gap is in salaries, not access.
What we'll be watching
Anthropic has not said how Deep Thinking interacts with their planned web search and tool-use upgrades, which are expected at Anthropic Build later this year. We expect those to matter more than the benchmark wins. Reasoning that can call tools is qualitatively different from reasoning that cannot.
We'll have more on this as it ships.
One email. The week in AI, Pakistan tech, and global business.
Curated by Faizan Khan. No filler. Unsubscribe in one click.
Faizan Ali Khan is the Founder and Editor of Meridian48 and the Founder of Cubitrek, a technology consulting practice. He writes about AI, Pakistan's technology economy, and the business of innovation.
More from this author →