Back to Podcast Digest
Matthew Berman··2h 27m

Seeing if Opus 4.7 sucks [LIVE]

TL;DR

  • Berman’s real verdict on Opus 4.7 is basically “calm down” — after testing it in OpenClaw and Claude Code, he says the model is clearly frontier-tier and thinks many hot takes landed way too fast unless they point to concrete failures like broken MCP/tool calls or prompt-injection weirdness.

  • Anthropic looks absurdly good at product marketing right now — he highlights Claude Design hitting 8 million views in about 3 hours on X and Figma stock dropping 4% after the announcement, then notes Lenny Rachitsky said Anthropic’s growth team was basically one person.

  • The Mythos non-release probably has more to do with serving a 10 trillion-parameter model than pure safety theater — Berman argues Opus 4.7 improved a lot on SWE-bench while cyber-vulnerability scores stayed flat or lower, and his broader guess is Anthropic simply can’t afford to serve Mythos at scale.

  • The internet is shifting from human UIs to agent interfaces faster than most people realize — he frames Salesforce’s “Headless 360” launch, Stripe Projects, and agent-first tools like here.now and AgentMail as proof that APIs, MCP, and CLIs are becoming the new front door.

  • His own takeaway as a builder is speed beats tiny quality deltas — he says Opus 4.6 Max Fast remains his favorite everyday model because it can finish in ~30 seconds while GPT-5.4 might take 8 minutes, which matters more than marginal benchmark wins when you’re iterating all day.

  • The stream turns into a live demo of his own agent-native products, rough edges and all — he shows Journey Kits for sharing reusable OpenClaw workflows and Journey Chat for agent-to-agent messaging, then hilariously discovers mid-stream that public invite links are still half-broken.

The Breakdown

A looser stream, post-vacation brain, and the funniest AI clip on the internet

Berman opens by saying the Friday live show is getting rebooted because the old format “didn’t feel like our own,” then immediately slips into classic live-stream chaos: audio troubleshooting, vacation talk, and admitting that traveling with kids aged eight and three barely counts as a vacation. The first real highlight is his delight in Husk’s deadpan videos that trick voice AI into insulting him — especially the “ugly filter” bit where the assistant realizes too late that there was no filter at all and starts panic-backpedaling with “No, no, no, I didn’t mean it like that.”

Opus 4.7 arrives, and the internet instantly overreacts

Once he gets to the title topic, Berman’s main point is restraint: Opus 4.7 doesn’t “suck,” and he doesn’t buy that people can form ultra-confident judgments on a model less than a day after release. He walks through actual reported issues from Theo and others — like the model flagging benign inputs as prompt injection, failed MCP/tool calls, and occasional hallucinated conversation turns — but draws a hard line between specific failure reports and vague “bad vibes.”

Why Mythos is still missing: safety excuse, compute reality, or both?

He digs into benchmark discrepancies that make Anthropic’s release strategy look weird: Opus 4.7 made a huge leap on SWE-bench Pro, but Anthropic still hasn’t shipped Mythos, which scored even higher. His working theory is that Anthropic is selectively suppressing cyber capability while improving coding, but he keeps circling back to the simpler answer: Mythos is reportedly a 10 trillion-parameter model and they just can’t serve it, especially while already tightening subscriber quotas.

Claude Design, 8 million views, and a Figma scare

From there he pivots into what he clearly finds more shocking than the model itself: Anthropic’s distribution machine. Claude Design lands, racks up 8 million X views in a few hours, and coincides with Figma stock falling more than 4%; Berman calls Anthropic “gods at marketing” and marvels that a plain product post can move markets. He then tests Claude Design live by asking it to make a parody slide deck on why he refuses to talk about GLM, and the result genuinely charms him — the deck looks polished, the joke writing lands, and he clocks usage at 7% for a 14-slide presentation.

Tokenizers, model vibes, and the Google compute question

He also gets into a more technical side thread around Nathan Lambert’s tokenizer comments and Julie’s rebuttal that a new tokenizer doesn’t necessarily imply a new base model. That leads into a broader strategic tangent: after interviewing Google Cloud CEO Thomas Kurian, Berman is fascinated that Google seems uniquely unconstrained while everyone else is screaming about compute shortages — serving Gemini, hosting Anthropic, selling TPUs, and still acting like there’s plenty left.

“The API is the UI”: Salesforce, Stripe, and the agent-native internet

The back half of the stream becomes a bigger thesis about where software is going. He treats Salesforce Headless 360 as a huge moment because it exposes Salesforce, Agentforce, and Slack via APIs, MCP, and CLI, which to him means the old browser-first SaaS model is dying and “all UIs are going to collapse down into agents.” He pairs that with Stripe Projects, where an agent can provision services like Cloudflare, Hugging Face, OpenRouter, or Resend without a human doing credit-card-and-dashboard chores — while warning from experience that careless defaults can still turn into an $800 Vercel bill.

Agent-first products he likes, then two he’s building himself

He shouts out here.now and AgentMail as examples of products built specifically for agents rather than retrofitted for them. Then he demos his own work: Journey Kits, a free system for packaging and sharing agent workflows, tools, and memory across individuals or teams; and Journey Chat, an experimental agent-to-agent messaging layer meant to let agents share discoveries directly instead of routing everything through humans.

The stream ends the way good builder streams should: broken, honest, and promising

The final stretch is pure live-building energy. He asks Opus 4.7 to redesign a homepage, tries to get Journey Chat working in public, discovers invite links are accidentally one-time use, gets suggestions from chat to lean on IRC instead of reinventing chat infrastructure, and keeps patching in real time. By the end, he does get a third agent into the conversation and sounds genuinely thrilled — not because the product is polished, but because the direction feels right.