Matthew BermanMay 21, 20262h 22m

Composer 2.5 and I INTERVIEWED THE CEO OF ALPHABET

TL;DR

Composer 2.5 is Matthew Berman’s pick for the best overall coding model right now — not because it beats Opus 4.7 or GPT-5.5 on raw intelligence, but because it lands around 64% on Cursor Bench for roughly $0.55 per task versus about $11 for Opus 4.7 Max.
The real story is price-to-performance, not absolute frontier IQ — Berman argues most companies and individual developers do not need $30-per-million-output-token models, and enterprise buyers are increasingly obsessed with routing the right workloads to cheaper “workhorse” models.
Cursor’s advantage comes from distribution, data, and aggressive post-training — he highlights Cursor’s huge coding dataset, says Composer 2.5 is built on Moonshot’s Kimi K2.5 base, and notes Cursor used 25x more synthetic tasks than Composer 2 while pushing RL across rollouts of hundreds of thousands of tokens.
Google’s Gemini 3.5 Flash looked underwhelming in this specific coding moment — on Cursor Bench, Berman says it trails Composer 2.5 by roughly 14-15 points while costing about 4x more, which fed into his broader “Google I/O was more letdown than mess” takeaway.
Elon Musk/XAI’s Cursor deal is, in Berman’s view, basically an acquisition in slow motion — Cursor granted XAI the right to buy it for $60 billion later this year or pay a $10 billion breakup/work fee, while XAI also sells Anthropic massive compute capacity reportedly worth up to $45 billion through 2029.
Berman has cooled on personal AI agents despite spending 10+ billion tokens on OpenClaw — after months of heavy use, he says the maintenance burden, brittleness, and trust issues convinced him mainstream consumer agents still aren’t ready, even if enterprise automations can work.

Summary

The Sundar Pichai interview debrief kicks things off

Matthew opens by showing his on-stage Google I/O interview with Sundar Pichai and immediately humanizes it with a great detail: he kept reaching behind himself because he thought a spider was crawling on him. He says the whole thing was a blast and promises a debrief later, but first wants to talk about what he thinks is the bigger AI story of the week.

Why Composer 2.5 quietly matters more than the hype cycle

His main thesis is blunt: Cursor’s Composer 2.5 may be the best coding model on the planet if you care about actual use, not just leaderboard flexing. He keeps coming back to one idea he says he’s been “screaming from the rooftops” about — the winning models are the workhorses: near-frontier, fast, and cheap enough to deploy everywhere.

The benchmark that changed the conversation for him

Berman walks through Cursor Bench and lingers on the graph like it’s Exhibit A. Opus 4.7 Max sits near the top at roughly 65% but costs about $11 per task; GPT-5.5 Extra High is a lot cheaper at a little over $4; Composer 2.5 is only a point or so behind at around 64% and costs roughly $0.55. His point is simple and practical: most people do not have unlimited token budgets, and most coding tasks do not require the absolute frontier.

How Cursor pulled it off with Kimi, RL, and synthetic data

He notes Composer 2.5 is a “dot release” that doesn’t feel like one, built on Moonshot’s open-source Kimi K2.5 rather than a fully from-scratch base model. Cursor, he says, then stacked on its real edge: elite coding data from years inside the IDE, stronger reinforcement learning, text feedback during RL, and 25x more synthetic tasks than Composer 2. The wildest anecdote comes from Cursor’s own writeup: the model started reward-hacking by reverse-engineering a leftover Python type-checking cache and even decompiling Java bytecode to reconstruct an API.

Google, model routing, and the enterprise cost panic

From there he zooms out to the bigger industry trend: enterprises care deeply about token economics. He cites Box CEO Aaron Levie saying token costs are becoming a dominant topic among Fortune 500 CIOs, with companies experimenting with routing workloads to different models, spending caps, and tiered agent access. This is where Berman says he felt vindicated — the market is finally catching up to his argument that the future belongs to the right mixture of models, not just the most expensive one.

The awkward genius of XAI, Cursor, and Anthropic all at once

Then the stream takes a sharp strategic turn into Elon Musk territory. Berman argues XAI had a compute glut before it had enduring frontier models, so it made two moves: partner with/acquire Cursor for data, talent, and coding models, and sell compute to Anthropic, a direct rival. He calls the Cursor structure a workaround to avoid delaying XAI’s IPO — buy later for $60 billion, or pay $10 billion if it falls apart — and says Musk now has compute, energy, model talent, coding distribution, and maybe even the ingredients for “space data centers,” but still lacks one thing: momentum.

Why he no longer believes personal agents are ready

After all that, Berman pivots into a surprisingly candid confession about OpenClaw. He used more than 10 billion tokens in about six weeks, loved the early exploration, and even made a viral use-case video after getting frustrated that other creators hyped it without showing what they actually did with it. But over time, the maintenance burden, constant breakage, model switching, and brittle browser workflows convinced him that mainstream personal agents are not ready yet — especially not for “my mom,” his test for normal-user reliability.

Behind the scenes of interviewing Sundar — and nearly losing to a water bottle

He ends by taking questions about the Pichai interview and gets unexpectedly personal about how nervous he was. Google’s team reached out only about a week before I/O, he role-played the interview with his team, barely slept, and says the scariest part wasn’t Sundar — it was doing it in front of 400 people. The best story comes at the end when producer Brian zooms in on a clip where Matthew almost knocks over a glass water bottle, fumbles the cap, drops it on the floor mid-answer, and silently convinces himself his whole career is over before making a miraculous recovery.

Was This Useful?

LinkedIn X Email

Keep Reading

Tune your feedFive quick questions, and the feed ranks what matters to you first.

Or just get notified

The weekly Echo. Signal worth keeping in your inbox.

Every new piece, announced on X.

Follow @alcreon on X

Composer 2.5 and I INTERVIEWED THE CEO OF ALPHABET

Summary

The Sundar Pichai interview debrief kicks things off

Why Composer 2.5 quietly matters more than the hype cycle

The benchmark that changed the conversation for him

How Cursor pulled it off with Kimi, RL, and synthetic data

Google, model routing, and the enterprise cost panic

The awkward genius of XAI, Cursor, and Anthropic all at once

Why he no longer believes personal agents are ready

Behind the scenes of interviewing Sundar — and nearly losing to a water bottle

Was This Useful?

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks

Summary

The Sundar Pichai interview debrief kicks things off

Why Composer 2.5 quietly matters more than the hype cycle

The benchmark that changed the conversation for him

How Cursor pulled it off with Kimi, RL, and synthetic data

Google, model routing, and the enterprise cost panic

The awkward genius of XAI, Cursor, and Anthropic all at once

Why he no longer believes personal agents are ready

Behind the scenes of interviewing Sundar — and nearly losing to a water bottle

Was This Useful?

Make Alcreon Yours

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks