Back to Podcast Digest
Rate Limited1h 1m

Gemini 3.5 Flash, Composer 2.5 is a Beast, Google IO, We Live in Exciting Times | Ep 16

TL;DR

  • Google’s Gemini 3.5 Flash may be a bad deal despite the “Flash” label — Eric and Adam argue it’s a “token guzzler” with pricing up to $2.70 per million input tokens and $16.20 output on priority, often landing near Sonnet-level costs while losing the cheap workhorse role Gemini Flash used to fill.

  • Cursor’s Composer 2.5 is the surprise star of the week for coding — Ray says it has replaced much of his Codex usage because it feels like “a cross between Opus and GPT-5.5,” moves incredibly fast, and even set up a QA workflow that launched multiple browsers and wrote useful bug reports.

  • The real frontier isn’t just models — it’s harnesses, orchestration, and workflow design — a huge chunk of the conversation is about why tools, skills, MCPs, and goal loops still feel clunky, with too much context bloat, inconsistent model behavior, and lots of hidden latency from figuring out what to do before doing it.

  • Google fumbled the Anti-Gravity 2 rollout and Gemini CLI deprecation — users woke up to an app update that effectively removed the IDE, broke settings, and forced a separate reinstall, while Gemini CLI users got about 30 days to move despite Google having helped steward the ACP standard.

  • GPT-5.5 Low and Medium are quietly becoming default workhorses — all three hosts say they’re leaning heavily on GPT-5.5 because it’s reliable and efficient, with Eric saying 5.5 Low often completes narrow tasks faster than ostensibly smaller models because it overthinks less.

  • Karpathy joining Anthropic feels like a bet on impact and access to the frontier — the hosts frame the move less as a career twist and more as a technologist wanting to be back in the lab, close to researchers and fast-moving ideas, while the window to shape AI still feels unusually open.

The Breakdown

Google I/O Hype Meets Gemini 3.5 Flash Sticker Shock

The episode opens with the crew trying to make sense of an absurdly packed AI week: Google I/O, new Gemini models, Composer 2.5, and even Andrej Karpathy landing at Anthropic. Eric immediately throws cold water on the Gemini 3.5 Flash excitement, calling it expensive in practice because it “burns tokens like nothing else,” to the point that GPT-5.5 can end up faster and cheaper despite the higher sticker price.

Flash Isn’t Really “Flash” Anymore

Adam says he loved the older Gemini Flash line for fast agentic workflows, but thinks Google has changed what “Flash” means. The new model is quick in tokens-per-second, sure, but it’s dramatically pricier, has a weirdly expensive caching model, and seems optimized more for coding than as a general-purpose cheap workhorse. Ray adds that it follows highly structured prompts well, but lacks the easy “playful intelligence” he gets from GPT-5.5 when he’s just talking naturally into a mic.

Why Google’s Pricing and Product Moves Feel Off

The group speculates that Google is protecting margins and managing compute scarcity rather than trying to win on price. That leads into a broader complaint: businesses built around the economics of earlier Gemini Flash versions may now be broken, especially since the cheaper Gemini 3 Flash preview is being sunset while the pricier 3.5 GA model takes its place.

Anti-Gravity 2 and the Gemini CLI Backlash

Eric goes on a full rant about Google’s coding-product strategy, especially the Anti-Gravity 2 rollout. Existing users updated one day and suddenly found the IDE gone, settings broken, and their workflow replaced with something closer to a rough Codex-style app; meanwhile Gemini CLI is effectively being killed for consumers, even though enterprise users keep a lifeline. The vibe is less “product evolution” and more “you woke up and your tool got swapped out underneath you.”

Composer 2.5 Becomes the Unexpected Workhorse

Then the energy flips: Ray is almost giddy about Cursor Composer 2.5. He says it took over everything from planning to implementation, and even helped him spin up a QA agent that created skill files, launched multiple browsers, clicked through flows at high speed, and wrote bug reports that caught issues he’d normally find himself.

Why Composer 2.5 Feels Different

Adam backs him up, calling Composer 2.5 “phenomenal” for coding and real-time prototyping during live conversations. Both of them are stunned by the speed, especially if it’s really based on the notoriously heavy Kimi K2.5 model; Eric is impressed too, but warns that shipping rapid-fire checkpoints in response to bug reports can create regressions and make a model feel unstable from week to week.

Tools, Skills, and Why Agent Setups Still Feel Messy

From there the talk gets more philosophical and practical: tool search, MCPs, skills, and giant prompt setups are still awkward. Eric argues that dynamic tool loading is partly a crutch for people who preload too much junk, while Adam says he stays minimalist because once you load 50 or 60 tools, things start falling apart; Ray jokes that he “raw dogs skills” but admits he only wants repeatable workflows in there. The key point: model behavior changes enough across GPT, Claude, and others that even a skill prompt that worked yesterday can break tomorrow.

The Models They Actually Reach For — and What New Users Should Do

When they compare their real usage, the consensus is clear: GPT-5.5 Medium and Low are getting the bulk of the work, while Anthropic usage has dropped hard. Ray still loves Composer 2.5 for coding and Codex for full computer control — including reading Mail, navigating apps, and running in the background on Mac with a separate cursor, which he thinks people are massively underrating. They close by talking about Karpathy joining Anthropic and giving advice for newcomers: stop obsessing over code first, look at your daily workflows, automate one repetitive task, and build your “mana” for delegation from there.

Share