Back to Podcast Digest
Matthew Berman··17m

My Honest Thoughts about Deepseek

TL;DR

  • DeepSeek V4 is the bigger story than its benchmarks — Matthew Berman says the headline isn’t that China “caught up,” but that DeepSeek released a frontier-adjacent, open-weights model with a 1 million token context window at a fraction of US-model pricing despite working under chip constraints.

  • The real threat is cost-performance, not who wins a benchmark by 1-2 points — DeepSeek V4 Pro trails models like Opus 4.7 and GPT-5.5 slightly on tests such as MMLU Pro and GPQA Diamond, but Berman argues that for most enterprise use cases, “nearly as good” plus much cheaper is the decision-maker.

  • DeepSeek’s efficiency story matters because it was built with fewer resources and unusually transparent reporting — he highlights the 1.66 trillion total-parameter MoE model with 49 billion active parameters, the smaller V4 Flash at 284 billion total / 13 billion active, and DeepSeek’s unusually candid white paper that even admits failures.

  • Export controls are both working and failing at the same time — Berman says China clearly has less compute access than the US, but DeepSeek shows that algorithmic innovation can offset weaker hardware enough to produce frontier-level systems on “nerfed Nvidia GPUs” and domestic Chinese chips.

  • The distillation panic doesn’t fully explain DeepSeek’s quality — citing Anthropic’s report, he notes DeepSeek was associated with only about 150,000 exchanges versus Moonshot’s 3.44 million and MiniMax’s 13 million, which he says is far too little to account for DeepSeek’s performance on its own.

  • Berman’s core warning is geopolitical and economic — if US and allied enterprises choose cheap, customizable Chinese open-source models over $30-per-million-output-token options like GPT-5.5, the US risks losing AI platform control, return on massive infrastructure spending, and even influence over what global AI systems are allowed to say.

The Breakdown

DeepSeek V4 arrives, and Berman says the benchmarks aren’t the main event

Berman opens by saying he was ready to do a normal model review, then realized the real story was much bigger. The shocking part, in his framing, is that China produced a frontier-level, open-weights model for far less money and compute than the US labs that supposedly have every structural advantage.

The R1 flashback: the moment DeepSeek first changed the conversation

He rewinds 18 months to DeepSeek R1, which he says proved open-source labs outside the US could build “thinking” models once thought exclusive to American closed labs. He recalls the stock market dropping roughly 20% overnight and uses Jevons Paradox to explain why cheaper AI didn’t reduce demand for Nvidia GPUs — it made people want even more.

What’s actually in V4: giant MoE scale, million-token context, and a cheaper workhorse

Berman walks through the launch post: V4 comes in Pro and Flash, with Pro offering a 1 million token context window and a 1.66 trillion-parameter mixture-of-experts model with 49 billion active parameters. Flash is the faster, cheaper “workhorse” at 284 billion total parameters and 13 billion active, and both were trained on roughly 33 trillion tokens.

Good enough to be dangerous: the benchmark story is about price, not bragging rights

On charts like MMLU Pro, GPQA Diamond, and SWE-bench Verified, DeepSeek sits slightly behind models like Opus 4.7 and GPT-5.5 — but not by much. Berman’s point is blunt: most buyers do not need the absolute smartest model on Earth, and if DeepSeek is close enough while costing far less, that changes the market.

The US-China race is now a pattern: America jumps ahead, Chinese open source closes the gap

He shows the Arena ELO arc from GPT-4 through Qwen, GLM-4, o1-preview, and then DeepSeek R1, describing a repeating rhythm where US frontier labs surge and Chinese open-source labs catch up. His warning is that China has historically been behind, but that pattern may not hold forever.

Export controls help — but DeepSeek shows their limit

Berman says export controls do constrain China, because it plainly has less compute than the US even if some chips are smuggled in. But DeepSeek proves the controls don’t solve everything, because algorithmic gains are letting Chinese labs train and serve impressive models on weakened Nvidia hardware and local chips.

Distillation attacks are part of the story, but not the whole explanation

He brings in Anthropic’s report and the US government statement alleging large-scale Chinese distillation campaigns, then immediately complicates the narrative. DeepSeek’s reported activity — about 150,000 exchanges — looks tiny next to Moonshot’s 3.44 million and MiniMax’s 13 million, and he says benchmark testing can look a lot like “distillation” from the outside.

Why this gets economic and geopolitical fast

The hardest-hitting section is his enterprise thought experiment: if a CEO can get most of what they need from an open-source DeepSeek model instead of paying prices like GPT-5.5 at $30 per million output tokens, the choice gets obvious. Berman argues that if US companies and allies build on Chinese models, America risks not just lost AI revenue, but strategic dependence, security exposure, and cultural influence over what AI systems can and can’t say.

His prescription: open source harder, and slash prices faster

Berman closes by saying the US needs two responses: embrace serious open-source frontier models and push closed-model pricing down much faster. He ends with a jab that if DeepSeek is doing everything right, Anthropic may be doing everything wrong lately — classic Matthew Berman, a little exasperated and very direct.