Back to Podcast Digest
How I AI13m

No hype Claude Opus 4.8 review—my real experience

TL;DR

  • It ships impressive first drafts, then stumbles on the finish line — In Claude Code, Opus 4.8 autonomously built a new prototyping capability for ChatPRD in about 20 minutes and it worked on the preview branch, but follow-up iterations introduced bugs and struggled with the final 10%.

  • Hallucinations were the biggest red flag — Clarvo says Opus 4.8 "100% made up things based on hypothesis not data," including admitting in follow-ups that it hadn’t actually searched GitHub or validated bugs it spoke about confidently.

  • Existing codebases exposed its weaknesses — Rebasing in-flight branches after a large PR turned into repeated cycles of fixes, with the model failing to orient itself to edge cases and the proper scope of changes.

  • It felt oddly unambitious for a flagship coding agent — Even after prompting it to build something fun for his 9-year-old and push into more agentic territory, the outputs were cool but not the kind of "10x" or "blow my mind" work he expected.

  • Opus 4.7 beat 4.8 on strategy work — Given the same three months of business context, Opus 4.7 produced more numbers-anchored, structured analysis, while 4.8 over-weighted small signals and generated a roadmap that felt handwavy.

  • The ergonomics are genuinely strong — Clarvo liked the voice, speed, token efficiency, cleaner writing style, and lack of "slop tells," and noted Anthropic’s new controls like dynamic workflows and effort settings as meaningful product improvements.

The Breakdown

Claude Opus 4.8 crushed a 20-minute one-shot feature build, then repeatedly fell apart on the last 10%—hallucinating bugs, missing context in existing codebases, and giving strategy advice that felt confident but ungrounded. Clarvo’s verdict: fast, pleasant, and promising for greenfield prototypes, but not yet the model he’d trust for edge cases or reality-anchored business work.

Was This Useful?

Share