Opus 4.7 Feels Weird? Claude Design is Amazing, Cursor? RooCode End of an Era | Ep 14
TL;DR
Claude Opus 4.7 is the most divisive Anthropic release yet — Adam says 4.5 felt like a real jump, 4.6 felt like “nothing,” and 4.7 is weirdly inconsistent enough that some previously Claude-loyal engineers are now shifting execution work back to GPT models.
The trick with Opus 4.7 seems to be giving it less direction, not more — Ray says the model often resists over-specified workflows, performs better with self-discovery, and feels like a “Ferrari F40” compared to the older “Honda Civic” ease of 4.5/4.6.
Claude Design impressed all three hosts as a serious UI accelerator — Adam used it to turn a simple business logo into a usable e-commerce storefront, while Ray said it felt like sitting with a “$100k designer” and then exported a 500-line design brief that Gemini and Cursor could reuse.
Kimmy/Kimi 2.6 looks strong on benchmarks but still lags in real-world coding flow — Ray found it only completed about 60–70% of a design implementation task that Cursor Composer 2 and Gemini 3.0 Flash + Stitch nailed, and Adam says its speed and tool-calling still make it hard to use as a daily driver.
Cursor’s SpaceX/xAI tie-up looks less like a partnership and more like the setup for consolidation — the hosts see compute constraints, razor-thin token margins, and proprietary model pressure pushing AI coding tools toward acquisition, with Adam openly predicting Cursor may eventually get absorbed.
The bigger shift is from single-chat coding to orchestration — the strongest closing idea is that there may be no single “right” AI workflow, but teams are moving toward delegated sub-agents, lightweight task loops, and model routing where GPT, Claude, Gemini, and local tools each handle different parts of the job.
The Breakdown
Opus 4.7 Lands With More Confusion Than Hype
Eric opens by calling it “the month of Anthropic shipping,” but the tone quickly turns skeptical. Adam says Opus 4.7 is the first Claude release where he and other heavy users can’t even decide whether they like it more or less than 4.5 or 4.6, which he reads as a bad sign. The sharpest tell: one formerly die-hard Opus engineer in his circle has already moved execution work over to GPT.
The Strange Psychology of Claude: You Have to “Gaslight the Model Back”
Ray’s funniest but most revealing line is that with 4.7, “you have to gaslight the model back.” Watching thinking traces in Droid, he sees the model resisting direct instructions, then performing better when he strips context away and lets it “self-discover” the task. Eric agrees the model often seems to refuse or sidestep explicit asks, and both of them frame 4.7 as powerful but oddly hard to steer.
Why Opus 4.7 Feels Like a Ferrari, Not a Honda Civic
Ray compares older Claude models to a Honda Civic: easy to drive, dependable, maybe not magical. Opus 4.7, by contrast, is a Ferrari F40 — if you know how to handle it, you fly; if you don’t, it throws you off a cliff. That vibe carries into Eric’s comments on tool use and prompting: if you define the goal clearly and stop micromanaging the method, 4.7 can shine, but prescriptive workflows often make it spiral.
Claude Design Is the Surprise Hit of the Episode
Once they switch to design, the energy changes completely. Adam says Claude Design plus a simple logo upload gave him essentially everything needed for a polished storefront, and Ray describes a livestream where screenshots and a few detailed answers turned into a full blog, about page, and design system that felt like having a senior designer in the room. The wild part: Claude then compressed the whole interaction into a dense design brief that other agents could reuse.
AI Design Won’t Kill Designers, But It Might Be Their Cursor Moment
The hosts push past the “Figma killer” headline and land somewhere more nuanced. Eric argues Claude Design is fantastic for non-designers and early-stage startups that used to spend $30,000 to $50,000 on a brand kit, but not a replacement for real UX judgment, system design, or collaborative product design. Adam and Eric both keep coming back to the UI/UX split: AI is getting scary good at making things look good, but still weak on wayfinding, tradeoffs, and product-level reasoning.
Kimi 2.6 Looks Great on Paper, Less Great in the Chair
Eric tees up Kimi 2.6 as a benchmark standout, but Ray’s real test is more sobering. He fed it a long design handoff from Claude Design, and on the web and in Droid’s agent swarm it only got about 60–70% there, while Cursor Composer 2 produced roughly 4,000 lines of working code in five minutes and Gemini 3.0 Flash with Stitch also nailed the task. Adam’s main complaint is simpler: Kimi is too slow, often around 20 TPS, and still weaker on tool accuracy than the western models.
Why Opus 4.7 Feels Expensive in More Ways Than One
The group digs into why 4.7 burns budget so fast. Eric notes Anthropic changed the tokenizer, which can make the same input consume roughly 30% more tokens or worse, and Anthropic itself says “high” reasoning is the minimum for agent use, with “extra high” preferred and “max” often overthinking. Adam says that tracks exactly with what he’s seeing: context fills faster, costs pile up quickly, and the model’s heavier thinking doesn’t always translate into better outcomes.
Cursor, SpaceX, and the End of the Easy AI-Coding Era
The Cursor-SpaceX/xAI deal sparks a broader market conversation. Adam sees it as a sign that AI coding is entering consolidation: compute is scarce, margins are thin, and tools that depend on OpenAI or Anthropic APIs may not survive without owning infrastructure or getting acquired. The final stretch turns practical and personal, with all three agreeing there’s no single correct AI workflow anymore — just a messy frontier of orchestration, sub-agents, cron-like loops, and model-specific strengths that teams are still figuring out in real time.