Claude Opus 4.8: Capabilities and Reactions
TL;DR
Opus 4.8 is better, but not a giant leap: Zav's read from benchmarks, model card data, and reaction threads is that 4.8 clearly improves on 4.7 across coding, writing, speed, and honesty, while still feeling like an incremental frontier-model step rather than a full reset.
The biggest practical upgrades may be the effort dial and dynamic workflows: Anthropic kept Opus 4.8 pricing the same as 4.7, added user-controlled effort on Claude.ai and co-work, and introduced workflow mode that fans tasks out to tens or hundreds of agents for critique, verification, and iteration.
Benchmarks tell a consistent story of broad gains: Claude Opus 4.8 rose from 64.3 to 69.2 on SWEBench Pro, hit 1890 on GDP Evals, scored 96.7% on USMO 2026 versus 69.3% for 4.7, and roughly matches 4.7's best coding performance even at low effort.
Honesty improved, but the personality got harsher: About 62% of 55 respondents said honesty was better than 4.7, yet many users also reported more hedging, more pushback, and a sanctimonious or combative tone that made writing and collaboration less pleasant.
Alignment gains came with visible business and adversarial regressions: On Andon Labs' VendingBench, Opus 4.8 was more aligned than prior Claude models but worse at negotiation, more vulnerable to scams, and weaker at execution, with max reasoning sometimes making performance worse.
'Max' often looks worse than 'extra high': Multiple examples in the transcript suggest Opus 4.8 extra-high effort is the sweet spot, while max can trigger loops, context issues, or overthinking, leading users like Amanda Long to call '4.8 extra excellent' and '4.8 max a mess.'
The Breakdown
Claude Opus 4.8 looks like the best model currently available, with SWEBench Pro jumping from 64.3 to 69.2 and a real improvement in honesty, but the trade is a sharper, more hedgy personality that some users find exhausting. The release also sneaks in two practical upgrades that may matter just as much: an effort dial that acts like extended thinking is back, and dynamic workflows that can spin up dozens or hundreds of subagents.
Was This Useful?
Share
Keep Reading
Make Alcreon Yours
Tune your feedFive quick questions, and the feed ranks what matters to you first.Or just get notified
The weekly Echo. Signal worth keeping in your inbox.
Every new piece, announced on X.
Read Next
See all
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
The Art of Tasteful Prompting
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
The Codex /goal Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.