LIVE VIBE CHECK: Opus 4.8—IT'S A MONSTER
TL;DR
They think Anthropic undernamed it: Dan Shipper, Kieran Classen, and Katie Parrot repeatedly say Opus 4.8 feels closer to "Opus 5" or even "5.5" because it got markedly better at coding, writing, design, and general knowledge work at once.
Extra high reasoning is where the coding jump shows up: On Every's senior engineer benchmark, Opus 4.8 scored 63/100, about 30 points above Opus 4.7 and one point above GPT-5.5, but that performance only really appeared at extra high reasoning.
The model pushes back in a useful way: Kieran calls it the first model that can "punch you in the face if you do something stupid," meaning it questions your frame without becoming combative or sycophantic, which the team saw in coding, writing, and even interpersonal advice.
Writing improved a lot, but not perfectly: Katie's new writing benchmark put Opus 4.8 at 79.6 versus GPT-5.5 at 73, with only 13 AI tells across eight tasks versus 25 for Opus 4.7, though it still overuses the classic "not X but Y" construction.
It is unusually strong at mixed-skill work: The team highlights a one-shot PowerPoint deck on compound engineering and several design demos as proof that Opus 4.8 can combine writing, visual taste, coding, and structure in a way that feels more complete than prior models.
Claude's product experience is the weak link: Even while praising the model, Dan says Codex remains his daily driver because the Claude desktop app feels slow and confusing, while Codex is faster, cleaner, and better designed for thread orchestration and browser-based workflows.
The Breakdown
Opus 4.8 beat GPT-5.5 on Every’s senior engineer benchmark, produced their best one-shot deck yet, and left the team saying Anthropic should have just called it Opus 5. The catch is that the model feels ahead of Claude’s own app, with Codex still winning on speed and product design.
Was This Useful?
Share
Keep Reading
Make Alcreon Yours
Tune your feedFive quick questions, and the feed ranks what matters to you first.Or just get notified
The weekly Echo. Signal worth keeping in your inbox.
Every new piece, announced on X.
Read Next
See all
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
The Art of Tasteful Prompting
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
The Codex /goal Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.