Opus 4.7 Changes Everything for AI Design
TL;DR
Anthropic shipped Opus 4.7 while holding back a stronger model called Mythos — the host says Mythos is finished, about 15% better on benchmarks, and only available to a few partners like Apple, Google, and Microsoft because of cybersecurity concerns.
The design jump from Opus 4.6 to 4.7 was obvious in a one-shot website test — asked to build a site for This Week in AI, 4.6 produced what the host calls a generic “AI slop” page, while 4.7 delivered a much more polished result he rated 7.5/10.
Opus 4.7 felt more like a real designer, not just a template filler — it added stronger typography, animation, copy like “cut through the hype, track the capital, interview the builders,” and a more coherent layout instead of random emojis and bland sections.
Safety rules changed the website-copying test in 4.7 — when asked to recreate WhisperFlow’s site exactly, 4.7 refused on copyright grounds, while 4.6 attempted the clone; even so, the host still preferred 4.7’s cleaner, less “slop”-looking adaptation for This Week in AI.
On research, 4.7 showed better self-correction and context-building but still had quirks — in a Texas seed-stage AI infra company search, it caught that Anthropic isn’t based in Texas and built a denser report, though one answer oddly gave multiple options for slots four and five.
Research mode exposed a reliability gap between the two models — both asked clarifying questions, but Opus 4.6 failed twice despite pulling 400+ sources, while 4.7 completed the task and packed in specifics like 411 facilities statewide, 47 in Austin, and Google’s West Texas data center.
The Breakdown
The model Anthropic didn’t ship
The video opens with the tease: Opus 4.7 is out, but Anthropic is apparently sitting on an even stronger model called Mythos. The host says Mythos is finished, works, and is only being shared with a tiny group — Apple, Google, and Microsoft — because of cybersecurity worries. That makes 4.7 feel a little strange: it’s the public release, but not the company’s real ceiling.
One prompt, two very different podcast websites
For the first test, both Opus 4.6 and 4.7 got the same simple ask: make a website for This Week in AI with great branding. 4.6 did some surface-level research and produced the kind of generic AI-generated page the host trashes as “AI slop” — bland layout, random emojis, weak aesthetic — though it did at least link correctly to the show’s Apple Podcasts page. 4.7, by contrast, came back with motion, better typography, stronger hierarchy, and even punchy copy like “cut through the hype, track the capital, interview the builders,” which the host says he may actually use.
Why 4.7’s design felt legitimately usable
Walking through the 4.7 site, the host lingers on the little things: blinking text, animated buttons, a fake but well-designed featured episode card, archive sections, team slide, and newsletter callout. Some details were invented — made-up numbers, a fake Jason quote, an episode count that doesn’t match reality — but the overall impression was far stronger. His verdict: a solid 7.5/10 and something he’d genuinely be impressed by if another podcast launched with it.
The WhisperFlow clone test runs into safety rules
The second test was about mimicry: recreate WhisperFlow’s website exactly. Opus 4.6 just went for it, taking full screenshots and reproducing a surprisingly decent version, including the floating words effect from the original. Opus 4.7 refused, explicitly saying it couldn’t copy protected creative work, so the host softened the prompt to “build all the same features and looks, but for This Week in AI.”
4.6 copied better, but 4.7 still won the vibe check
Even after the workaround and an extra prompt to add moving text, the host admits 4.6 may have done the motion effect slightly better. But he still gives the edge to 4.7 because the output felt cleaner, more intentional, and less obviously machine-made — things like the tiny “live” detail made it feel more designed and less stitched together. It’s a nice example of safety reducing literal imitation while still allowing stronger taste.
Research mode shows 4.7’s biggest practical upgrade
The last test asked both models to find the five most promising seed-stage AI infrastructure companies in Texas. Without research mode, 4.7 already looked more thoughtful: it searched, refined, and caught mistakes like excluding Anthropic once it realized Anthropic is based in Boston, not Texas. It still had a weird miss — instead of a clean top five, it gave multiple options for spots four and five, including a Series C company — but it was still more thorough than 4.6.
Clarifying questions, failures, and a denser final report
With research mode enabled, both models asked follow-up questions like how strict “seed stage” should be. Then 4.6 stumbled badly, failing twice even after pulling more than 400 sources, while 4.7 completed the assignment and delivered a tighter report packed with context: 411 facilities statewide, 47 in Austin, bigger late-stage rounds, and Google’s giant West Texas data center. The host’s conclusion is pretty straightforward: if you’re using 4.6, switch to 4.7 now — and keep in mind Anthropic still has Mythos waiting in the wings.