<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Agents on Sergio Comerón Blog</title>
    <link>https://sergiocomeron.com/blog/en/tags/agents/</link>
    <description>Recent content in Agents on Sergio Comerón Blog</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Thu, 28 May 2026 00:00:00 +0200</lastBuildDate>
    <atom:link href="https://sergiocomeron.com/blog/en/tags/agents/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Claude Opus 4.8 lands: better at code, more honest and with sub-agents in parallel</title>
      <link>https://sergiocomeron.com/blog/en/posts/claude-opus-4-8/</link>
      <pubDate>Thu, 28 May 2026 00:00:00 +0200</pubDate>
      <guid>https://sergiocomeron.com/blog/en/posts/claude-opus-4-8/</guid>
      <description>&lt;p&gt;Anthropic unveiled Claude Opus 4.8 this afternoon, its most capable production model. It arrives less than two months after Opus 4.7 and, most strikingly, without a price bump: it costs the same as the previous version.&lt;/p&gt;&#xA;&lt;h2 id=&#34;the-improvements-in-numbers&#34;&gt;The improvements in numbers&lt;/h2&gt;&#xA;&lt;p&gt;The benchmarks Anthropic published compare directly against Opus 4.7:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Agentic coding&lt;/strong&gt;: 64.3% → &lt;strong&gt;69.2%&lt;/strong&gt; — Long-running tasks where the model chains tool calls, reads files, runs tests and self-corrects.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Multidisciplinary reasoning with tools&lt;/strong&gt;: 54.7% → &lt;strong&gt;57.9%&lt;/strong&gt; — Problems requiring jumps between domains and the use of external tool context.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Knowledge work&lt;/strong&gt;: 1753 → &lt;strong&gt;1890&lt;/strong&gt; — Anthropic&amp;rsquo;s internal metric for analysis, writing and synthesis tasks.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;According to the data Anthropic shared, Opus 4.8 beats GPT-5.5 and Gemini 3.1 Pro on several of these benchmarks. On the internal Super-Agent benchmark, it&amp;rsquo;s the only model that completes every case end to end.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Gemini 3.5 Flash: speed over intelligence for AI agents</title>
      <link>https://sergiocomeron.com/blog/en/posts/gemini-35-flash/</link>
      <pubDate>Fri, 22 May 2026 00:00:00 +0200</pubDate>
      <guid>https://sergiocomeron.com/blog/en/posts/gemini-35-flash/</guid>
      <description>&lt;p&gt;Google presented Gemini 3.5 Flash at I/O on May 19th. Fast, cheap, optimised for agents. Good news for developers. But the most interesting thing is not what it does for Google — it&amp;rsquo;s who else is going to use it.&lt;/p&gt;&#xA;&lt;p&gt;Apple has a multi-year agreement with Google to integrate Gemini models into Apple Intelligence. WWDC 2026 is on June 8th. All signs point to the new Siri — the one they&amp;rsquo;ve been promising for years and never quite delivering — running, at least in part, on this very model.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
