Gemini 3.5 Flash: speed over intelligence for AI agents

Gemini 3.5 Flash: speed over intelligence for AI agents

Google presented Gemini 3.5 Flash at I/O on May 19th. Fast, cheap, optimised for agents. Good news for developers. But the most interesting thing is not what it does for Google — it’s who else is going to use it.

Apple has a multi-year agreement with Google to integrate Gemini models into Apple Intelligence. WWDC 2026 is on June 8th. All signs point to the new Siri — the one they’ve been promising for years and never quite delivering — running, at least in part, on this very model.

The irony is hard to miss: Apple, the company that has pushed hardest on privacy and on-device processing, delegating the brain of its assistant to the servers of its biggest competitor in search.

But to understand why Flash specifically, and not another Google model, you first need to understand how AI agents work.

What is an agent

A normal language model answers a question. An agent chains actions together: queries an API, reads the result, decides what to do, calls another API, checks if it’s done, and so on until the task is complete.

Imagine asking an agent to book a flight from Madrid to Tokyo for next week, within a budget of 900 euros, with a maximum one-hour layover. The agent doesn’t return an answer — it executes a plan:

  1. Query the flights API with the parameters
  2. Filter results by price
  3. Check layovers for each option
  4. Verify availability on the exact dates
  5. Compare with a second source to cross-check prices
  6. Select the best option
  7. Start the booking process
  8. Confirm passenger details
  9. Complete payment
  10. Generate a summary with the details

That’s ten model calls. In a real task it can be double that.

Why speed matters more than intelligence

Here’s the problem with using a very capable but slow model for an agent: latency compounds.

If each call takes three seconds, ten calls means thirty seconds of waiting. If there are twenty calls, a full minute. In production, with real users waiting, that’s unacceptable. With Gemini 3.5 Flash, those same calls take under a second each. The same twenty-step task goes from forty seconds to eight.

But there’s something more important than perception: in agents, you don’t need each decision to be perfect — you need the whole thing to reach the right destination. The agent has verification mechanisms, can retry, and can ask for confirmation when something doesn’t add up. Intelligence applied at the right step is worth more than maximum intelligence at every step.

Cost compounds too. Ten calls to an expensive model cost ten times what one call costs. Gemini 3.5 Flash costs ten times less than Claude Opus 4.7 per token. In a twenty-step agent serving thousands of users a day, that difference is what separates a viable product from one that can’t scale.

The numbers

Flash is not a second-tier model. According to benchmarks published by Google, it outperforms Gemini 3.1 Pro in coding, reasoning and multimodal understanding. The context window reaches one million tokens, dynamic thinking is enabled by default — allocating more compute depending on the difficulty of the problem — and in agentic workflows with tool calls it leads benchmarks against GPT-5.5 and Claude Sonnet 4.6.

Where Claude Sonnet 4.6 remains better is in complex code review and tasks where a human will read the output directly. The rule is simple: when a model acts as the brain of an automated process, speed and cost come first. When it produces something someone will consume with attention, quality comes first.

Two companies, two philosophies, one model

Google wants Gemini everywhere: in Search, Android, the cloud, agents. Apple wants its AI to look like its own even when it isn’t entirely. The practical result for the user is that within a few months Siri could be genuinely useful for the first time — something Apple’s own models hadn’t managed to deliver.

Google understood before anyone else where the market is going: not models that impress in demos, but models that work in the background while you do something else.

Two weeks to WWDC. And next month comes Gemini 3.5 Pro, which Google is testing internally. If Flash already delivers these results, the Pro version could shift the current landscape considerably.