Back to Podcast Digest
Matthew Berman1h 31m

Anthopic did a thing...

TL;DR

  • Anthropic says Claude wrote 80%+ of its merged code by May 2026: Berman treats that as a major industry signal, especially because Anthropic claims this jumped from low single digits before Claude Code launched in February 2025.

  • Task horizon is accelerating fast, from minutes to hours to potentially days: He highlights Anthropic's claim that models went from handling 4-minute software tasks with Opus 3, to 90-minute tasks with Sonnet 3.7, to 12-hour tasks with Opus 4.6.

  • The missing ingredient for full recursive self-improvement is research taste, not coding ability: Models can reproduce papers and execute experiments, but Berman says they still struggle with deciding what truly novel problem to pursue next.

  • Anthropic's own numbers suggest AI code is prolific but not yet clearly better: Employees estimated roughly 4x productivity gains with Mythos preview, even while code output rose about 8x, which Berman reads as a sign that AI-written code was still lower quality or harder to absorb operationally.

  • Research automation is improving at a startling rate: He calls out Anthropic's benchmark where Opus 4 found 3x speedups in May 2025, while Mythos preview reached 52x by April 2026, plus CoreBench moving from about 20% success to near benchmark saturation in 15 months.

  • Berman sees Anthropic's safety language as strategic as much as sincere: He repeatedly frames the paper as 'Anthropic coded', arguing the company is asking the world to slow down while using internal models like Mythos to extend its own lead.

The Breakdown

Anthropic says more than 80% of the code merged into its codebase is now written by Claude, and Matthew Berman argues that this is the clearest public signal yet that recursive self-improvement is moving from sci-fi concept to real engineering practice. His big takeaway is that the missing piece is not execution anymore, it is judgment: humans still choose what problems matter, but AI is rapidly taking over the rest.

Was This Useful?

Share