Claude Opus 4.8 lands: better at code, more honest and with sub-agents in parallel

Thu, 28 May 2026 00:00:00 +0200

Anthropic unveiled Claude Opus 4.8 this afternoon, its most capable production model. It arrives less than two months after Opus 4.7 and, most strikingly, without a price bump: it costs the same as the previous version.

The improvements in numbers

The benchmarks Anthropic published compare directly against Opus 4.7:

Agentic coding: 64.3% → 69.2% — Long-running tasks where the model chains tool calls, reads files, runs tests and self-corrects.
Multidisciplinary reasoning with tools: 54.7% → 57.9% — Problems requiring jumps between domains and the use of external tool context.
Knowledge work: 1753 → 1890 — Anthropic’s internal metric for analysis, writing and synthesis tasks.

According to the data Anthropic shared, Opus 4.8 beats GPT-5.5 and Gemini 3.1 Pro on several of these benchmarks. On the internal Super-Agent benchmark, it’s the only model that completes every case end to end.

Gemini 3.5 Flash: speed over intelligence for AI agents

Fri, 22 May 2026 00:00:00 +0200

Google presented Gemini 3.5 Flash at I/O on May 19th. Fast, cheap, optimised for agents. Good news for developers. But the most interesting thing is not what it does for Google — it’s who else is going to use it.

Apple has a multi-year agreement with Google to integrate Gemini models into Apple Intelligence. WWDC 2026 is on June 8th. All signs point to the new Siri — the one they’ve been promising for years and never quite delivering — running, at least in part, on this very model.

Agents on Sergio Comerón Blog

Claude Opus 4.8 lands: better at code, more honest and with sub-agents in parallel

The improvements in numbers

Gemini 3.5 Flash: speed over intelligence for AI agents