Riley BrownMay 31, 202626m

The Truth about Opus 4.8 and other Huge agent News

TL;DR

Opus 4.8 did not feel meaningfully better in practice: Riley says he spent three hours comparing Claude Opus 4.8 to 4.7 and "literally couldn't tell the difference," echoing Greg Eisenberg and Matt Wolfe's view that frontier model updates are starting to feel incremental.
GPT 5.5 looks stronger on coding efficiency and trust: Citing DeepSuite benchmarks, Riley says GPT 5.5 scored higher than Opus 4.8 at lower cost, with fewer tokens and less time per task, and he personally trusts GPT 5.5 more for deep coding and long agentic work.
The biggest progress is now in the AI super app, not the raw model: Riley argues the excitement has shifted from model releases to products like OpenAI Codex, where features such as Windows computer use, phone-to-desktop control, and built-in browsing make agents actually usable for daily work.
Codex is turning into a full operating surface for agents: The updates Riley is most excited about are persistent browser sign-in, multiple tabs per task, stronger search with Command-G, and the ability for one agent thread to spin up six or more sub-agents in parallel.
Dedicated vibe-coding tools may get compressed into Codex plugins: Riley thinks products like Replit, Lovable, and Bolt are only a month or two ahead on convenience because Codex can already recreate much of their value with one prompt plus services like Neon, Vercel, Google auth, and AI Gateway.
His biggest bet is 'agent mini apps' that generate the right UI on demand: Instead of static SaaS, Riley imagines agents creating little task-specific interfaces, like a Tinder-style email triage app connected to Gmail, so users can make the final 10 percent of edits and approve actions fast.

The Breakdown

Riley Brown says Anthropic's Claude Opus 4.8 feels like an iPhone-style incremental update he could barely distinguish from 4.7 after three hours of testing, while OpenAI's real edge is shifting to the app layer with Codex features like computer control from your phone, persistent signed-in browsing, and agents that can spawn other agents.