The Truth about Opus 4.8 and other Huge agent News
TL;DR
Opus 4.8 did not feel meaningfully better in practice: Riley says he spent three hours comparing Claude Opus 4.8 to 4.7 and "literally couldn't tell the difference," echoing Greg Eisenberg and Matt Wolfe's view that frontier model updates are starting to feel incremental.
GPT 5.5 looks stronger on coding efficiency and trust: Citing DeepSuite benchmarks, Riley says GPT 5.5 scored higher than Opus 4.8 at lower cost, with fewer tokens and less time per task, and he personally trusts GPT 5.5 more for deep coding and long agentic work.
The biggest progress is now in the AI super app, not the raw model: Riley argues the excitement has shifted from model releases to products like OpenAI Codex, where features such as Windows computer use, phone-to-desktop control, and built-in browsing make agents actually usable for daily work.
Codex is turning into a full operating surface for agents: The updates Riley is most excited about are persistent browser sign-in, multiple tabs per task, stronger search with Command-G, and the ability for one agent thread to spin up six or more sub-agents in parallel.
Dedicated vibe-coding tools may get compressed into Codex plugins: Riley thinks products like Replit, Lovable, and Bolt are only a month or two ahead on convenience because Codex can already recreate much of their value with one prompt plus services like Neon, Vercel, Google auth, and AI Gateway.
His biggest bet is 'agent mini apps' that generate the right UI on demand: Instead of static SaaS, Riley imagines agents creating little task-specific interfaces, like a Tinder-style email triage app connected to Gmail, so users can make the final 10 percent of edits and approve actions fast.
The Breakdown
Riley Brown says Anthropic's Claude Opus 4.8 feels like an iPhone-style incremental update he could barely distinguish from 4.7 after three hours of testing, while OpenAI's real edge is shifting to the app layer with Codex features like computer control from your phone, persistent signed-in browsing, and agents that can spawn other agents.
Was This Useful?
Share
Keep Reading
Make Alcreon Yours
Tune your feedFive quick questions, and the feed ranks what matters to you first.Or just get notified
The weekly Echo. Signal worth keeping in your inbox.
Every new piece, announced on X.
Read Next
See all
Playbook
Cheap Models, Hard Tasks
Most agent workflows route every step to the frontier model by default. The bill scales with how chatty the agent gets, even when most steps don't need that brain.

Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
The Art of Tasteful Prompting
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.