Back to Podcast Digest
Matthew Berman2h 34m

OPUS 4.8!!! (also maybe GPT5.6??)

TL;DR

  • Opus 4.8 lands fast and keeps the same base price — Anthropic says the new model improves judgment, honesty about progress, and long-running autonomy while holding standard pricing at $5 per million input tokens and $25 per million output tokens.

  • Fast mode got materially more attractive — Berman highlights that Opus 4.8 fast mode is now 3x cheaper than before, making it effectively around 2x the standard price while delivering roughly 2.5x the speed.

  • Benchmarks look strong, but the vibe check still matters more — Opus 4.8 posts 69.2% on SWE-Bench Pro and beats GPT 5.5 on Humanity’s Last Exam, yet Berman argues benchmarks like DeepSuite better reflect real coding feel, where many users still prefer GPT 5.5.

  • Dynamic workflows is Anthropic’s big product swing — The new Claude Code feature can orchestrate tens to hundreds of parallel sub-agents for bug hunts, migrations, and adversarial verification, though Berman immediately heard “my API bill skyrocket” when reading the description.

  • Live testing showed both frontier capability and jagged intelligence — Opus 4.8 built a playable 3D soccer game, a mostly working Rubik’s Cube solver, and a rough video editor, but still stumbled on the car-wash logic question and produced a weak homepage redesign after 29 minutes and 18 agents.

  • Anthropic looks newly unconstrained on compute and cash — During the stream Berman spots news of a $65 billion Series H at a $965 billion post-money valuation, ties it to Anthropic’s xAI/Amazon/cloud deals, and notes his Max plan usage was shockingly low despite heavy parallel-agent testing.

The Breakdown

Claude Opus 4.8 arrives just six weeks after 4.7 with a claimed 69.2% on SWE-Bench Pro, a cheaper fast mode, and a new dynamic workflows feature that can spin up parallel sub-agents — but Matthew Berman’s live tests were a mix of genuinely impressive demos and very funny failures on basic logic prompts. The bigger surprise may be Anthropic signaling an even stronger "Mythos" class model is coming in weeks, alongside a massive new funding round and what looks like a real easing of its long-hated usage limits.

Was This Useful?

Share