Riley BrownMay 31, 202626m

The Latest Codex Updates and The Truth about Opus 4.8

TL;DR

Opus 4.8 did not feel meaningfully different: Riley spent 3 hours comparing Claude Opus 4.8 with 4.7 and says he "literally couldn't tell the difference," echoing Greg Eisenberg and Matt Wolfe's view that model releases now feel like annual iPhone updates.
GPT 5.5 looks stronger on long coding tasks and efficiency: Citing DeepSuite frontier coding benchmarks, he says GPT 5.5 medium, high, and extra high scored above Opus 4.8 while costing less, using fewer tokens per task, and finishing faster.
OpenAI's biggest win this week was product, not model: Codex now supports Windows computer use, phone-to-desktop remote control through the ChatGPT app, persistent browser sign-ins, multiple browser tabs, better chat search, and visible GitHub-style usage stats.
Agents spawning sub-agents may be the most important new workflow: Riley shows a single Codex prompt creating six separate background threads with narrow briefs and completion criteria, framing it as the start of master-agent to sub-agent automation.
Dedicated vibe-coding tools are getting squeezed: He argues Replit, Lovable, and Bolt mostly package things that can now be replicated with one Codex prompt plus services like Neon, Vercel, Google auth, and AI Gateway, especially because OpenAI subsidizes GPT 5.5 usage in-app.
The next frontier is agent mini-apps tied to your integrations: His core obsession is generative UI, like an email triage app the agent creates on the fly using Gmail, Slack, or other authenticated tools, so you can review, edit, and send actions directly instead of bouncing back into chat.

The Breakdown

Riley Brown says Anthropic's new Opus 4.8 feels like an iPhone-style incremental update you can barely distinguish from 4.7, while OpenAI's real advantage is shipping fast, practical Codex app upgrades like computer control from your phone, persistent signed-in browsing, and agents that can spawn other agents. His bigger claim is that the next wave is not another marginal model bump, but agent-native apps and mini-app interfaces that AI generates on demand for real work.