Claude Opus 4.8 lands: better at code, more honest and with sub-agents in parallel
Anthropic unveiled Claude Opus 4.8 this afternoon, its most capable production model. It arrives less than two months after Opus 4.7 and, most strikingly, without a price bump: it costs the same as the previous version.
The improvements in numbers
The benchmarks Anthropic published compare directly against Opus 4.7:
- Agentic coding: 64.3% → 69.2% — Long-running tasks where the model chains tool calls, reads files, runs tests and self-corrects.
- Multidisciplinary reasoning with tools: 54.7% → 57.9% — Problems requiring jumps between domains and the use of external tool context.
- Knowledge work: 1753 → 1890 — Anthropic’s internal metric for analysis, writing and synthesis tasks.
According to the data Anthropic shared, Opus 4.8 beats GPT-5.5 and Gemini 3.1 Pro on several of these benchmarks. On the internal Super-Agent benchmark, it’s the only model that completes every case end to end.
Fewer hallucinations, more “I don’t know”
The most relevant qualitative improvement doesn’t show up in the charts. Anthropic says Opus 4.8 is “significantly less likely to confidently pretend it knows something when it doesn’t.” Instead, it acknowledges the limits of its own knowledge or asks for more context.
It’s a behavioural change in line with recent industry models: less blind confidence, more honesty about uncertainty. If it holds up in real use, it tackles one of the most persistent problems agents have in production.
Dynamic workflows: sub-agents in parallel
Alongside the model, Anthropic introduced dynamic workflows, a feature that lets the agent spin up multiple sub-agents in parallel and coordinate their results. Aimed at tasks that can be decomposed and run simultaneously instead of sequentially.
It’s the missing piece for the “one agent orchestrating many agents” pattern to become first-class, instead of something every team has to build by hand on top of the API.
Effort control
Another novelty in this launch is a control panel that lets you adjust how much “effort” Claude puts into a response. More effort means more reasoning time and more tokens, but also more depth. Less effort means lower latency and lower cost, useful for fast or high-volume tasks.
Pricing and availability
Pricing stays identical to Opus 4.7:
- Input: $5 per million tokens.
- Output: $25 per million tokens.
- Fast mode: $10 input and $50 output per million tokens.
- Up to 90% savings with prompt caching and 50% with batch processing.
Available starting today on Anthropic’s direct API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry and all Claude products.
How this affects me
Yesterday I published a post on how I code in 2026 where I described that I direct Claude Code more than I type code. Today Anthropic swaps the model out from under me.
Three concrete things from this launch connect directly with that flow:
- Fewer hallucinations. In yesterday’s post I called out “the risk of accepting without reading” as one of the problems that still worries me. A model that better recognises what it doesn’t know attacks that problem at the source.
- Dynamic workflows. The small cron-scheduled scripts I’m experimenting with — that natural next step of having the agent ship things without me approving each move — get official primitives instead of having to be reinvented.
- Effort control. For small, repeated tasks — checking that a service is alive, generating the daily news bulletin — lowering effort and latency is an immediate optimisation.
I’ll need to use Opus 4.8 for a few days in real work to see whether the benchmarks translate into day-to-day improvements. I’ll come back to it.