Latent SpaceJune 6, 202640m

⚡️Making DeepSeek v4 outperform Opus 4.7 with Taste — @AhmadAwais , CommandCode.ai

TL;DR

DeepSeek's weakness was often the harness, not the model: Awais says DeepSeek V4 Pro repeatedly sends malformed tool calls, then ignores schema errors for 50-plus failures per session, but deterministic repair logic can fix the call, return the result, and teach the model what went wrong.
Command Code learned 16,000 repair patterns across huge usage volume: After first spotting the issue in DeepSeek, the team found the same behavior in Kimi and MiniMax, building a library of repair variations from hundreds of billions of tokens and now processing roughly 600 billion tokens.
Taste is automatic repository-level memory for coding preferences: Instead of manually maintaining a giant rules file, Taste watches merges, edits, accepts, and rejects, then writes small markdown preferences like “use pnpm, but npm global link for local CLI linking” directly into the repo.
The same repair mindset applies to design slop: By giving models a small set of layout patterns, design “smells,” and preferences like OKLCH over HSL, Awais says they can remove most of the generic indigo-gradient dashboard look that designers spot in 1.5 seconds.
Open models are becoming the sweet spot for Command Code: Claude is forgiving enough to recover from bad tool interactions on its own, but open models benefit much more from a smart harness, which is why Command Code found product-market fit there and even launched a $1 plan for 600 million DeepSeek tokens.
Command Code is heading toward an Apple-style open source model strategy: Awais says the six-year-old codebase will be open sourced soon, made deeply hackable, but intentionally curated around a smaller set of strong open and closed models instead of supporting every model under the sun.

The Breakdown

A 3,200-line “repair” layer turned DeepSeek V4 from a stubborn tool-calling mess into something Ahmad Awais says can beat Opus 4.7 in real coding workflows, and the same idea now appears to clean up AI design slop too. The bigger claim is that many model failures are not capability gaps at all, but contract gaps between the model and the harness around it.