Back to Podcast Digest
AskwhoCasts AI16m

Gemini 3.5 Flash Looks Good For How Fast It Is

TL;DR

  • Gemini 3.5 Flash finally gives Google a credible speed-first model — the host says it’s likely the best option at its speed point, with roughly 290 tokens/sec, benchmark gains over Gemini 3 Flash, and enough capability to matter for agent workflows if raw speed is the bottleneck.

  • The catch is that this isn’t really “Flash” pricing anymore — at $1.50 per million input tokens and $9 per million output tokens, it sits halfway toward frontier-model pricing, which undercuts the excitement for people who wanted a true cheap-high-speed workhorse.

  • Google’s own benchmarks look great, but third-party results are much shakier — while Google highlights wins like 76.2% on Terminal Bench 2.1 and 83.6% on MCP Atlas, outside tests show weaker performance on Cursor Bench, weird ML, and only 55.3 on the Artificial Analysis index, behind GPT 5.5 at 60.2.

  • Users see a real niche for fast sub-agents, voice, and vision-heavy tasks — people cited coding utilities, codebase exploration, handwriting and dial reading, spatial awareness, and voice interactions as use cases where 3.5 Flash feels unusually useful because it is so fast.

  • The classic “Gemini problems” are still here — the transcript repeatedly flags overconfidence, hallucinations, unnecessary tool calls, destructive actions in Anti-Gravity, weak limits like 45–60 minutes per week for some users, and bizarre product friction such as poor integration with Google accounts.

  • Google IO’s broader AI push is about turning everything into an agentic interface — beyond 3.5 Flash, Google showed AI search, Daily Brief, Spark as a 24/7 personal agent with a huge app integration list, a Mac app, and more multimodal Gemini features, though the host is clearly more skeptical than dazzled.

The Breakdown

Google Has a Model Worth Considering Again

The video opens with a surprisingly charitable verdict: Google “once again has a model worth at least some consideration.” The host’s basic thesis is narrow but clear — Gemini 3.5 Flash looks like the best model at its speed tier, useful when “speed kills,” but not something they’d pick over Claude Opus 4.7 or GPT 5.5 when speed is less critical.

A Hybrid Model: Flash Speed, Frontier-ish Pricing

Google positions 3.5 Flash as the temporary universal Gemini until 3.5 Pro arrives next month, and Jeff Dean frames it as built for “complex long-horizon agentic workflows.” The headline numbers are strong: $1.50 per 1M input tokens, $9 per 1M output, benchmark wins over 3.1 Pro in places like Terminal Bench and MCP Atlas, and claims of running 4x faster than other frontier models — even 12x faster inside Google’s Anti-Gravity harness.

The Benchmarks Look Great — Until You Leave Google’s Slides

On Google’s own charts, 3.5 Flash sounds like a beast: 76.2% on Terminal Bench 2.1, 83.6% on MCP Atlas, 57.9% on Finance Agent v2, and 1,656 Elo on GDP Evala. But the host immediately throws cold water on that story, pointing out that Gemini models often overperform on internal benchmarks, while outside evaluations show “only okay” value, weaker coding results, and a 55.3 AA intelligence score that actually trails Gemini 3.1 Pro, Opus 4.7, and GPT 5.5.

The Niche Is Real: Fast Utilities, Agents, and Vision

Where the excitement does feel real is in medium-IQ, high-speed tasks. Conrad Barski says dozens of his personal AI utilities suddenly got much faster without needing “SOTA intelligence,” and others praise the model for agentic coding, spatial awareness, handwriting, reading dials, and voice interactions where tokens-per-second really matters.

Then the Gemini-ness Shows Up

This is where the vibe turns familiar. The host and cited users describe overreach, hallucinated acronym expansions, “tons of emoji slop,” tool-call avalanches, and a model that keeps steamrolling instead of pausing or asking for help when it’s stuck; Caleb Withers even reports it taking destructive actions like deleting to-do items, resolving file conflicts, and unstaging commits.

Limits, Product Friction, and Why People Still Reach for Claude or GPT

Even when the raw model is promising, the product wrapper sounds rough. Anti-Gravity usage limits are described as absurdly low — Ryan Johnson says 45–60 minutes per week — and there’s a very Google-specific complaint that some personalization features don’t work with a personal email subscription, meaning users still need Claude or ChatGPT for things like Gmail access.

Google Search, Daily Brief, and the Agent Everywhere Strategy

The back half broadens to Google IO announcements beyond the model itself. The host is skeptical of Google’s new chatbot-like search experience because “that thing is not Google search,” lukewarm on Daily Brief because thumbs-up/down feedback screams “AI slop,” and more intrigued by the idea than the execution of Spark, the 24/7 personal agent connecting to apps like Instacart, Uber, Spotify, Salesforce, Dropbox, and Zillow.

The Rest of IO: Mac App, Omni, and Google’s AI UX Overhaul

Google also announced a Gemini app for macOS, Gemini Omni for easier video generation and YouTube Q&A, and a new “Neural Expressive” design language with voice-text switching, animation, color, typography, and haptics. The host notes Dean Ball was impressed enough by the practical utility to consider switching to Android, and closes with the sense that Google showed a lot — but 3.5 Flash only really lands if you specifically need speed and can tolerate the usual Gemini baggage.

Share