Reacting to "Why AI is so smart but also so dumb?"
TL;DR
Karpathy says December was a real inflection point for coding agents — he describes going from fixing AI-generated code chunks to “I can’t remember the last time I corrected it,” which Matthew Berman says matches what frontier users felt as models plus agent harnesses suddenly became end-to-end useful.
The core shift is from specifying steps to verifying outcomes — Karpathy’s “software 3.0” framing says classic software works when you can write explicit rules, while LLM systems thrive when you can judge whether the final artifact is correct, which is why code and math have advanced so fast.
AI looks genius in code and idiotic elsewhere because training rewards are jagged — Karpathy argues frontier labs pour reinforcement learning into highly verifiable domains like coding, so models can refactor huge codebases or find zero-days yet still fail commonsense prompts like whether to drive 50 meters to a car wash.
Agent-native products are rewriting UX around copy-paste prompts, not human setup flows — examples like OpenClaw, Here Now, and Berman’s own Journey Kits reduce installation to a tiny instruction block for an agent, replacing long bash scripts and step-by-step docs with “here’s the outcome, go do it.”
Karpathy draws a sharp line between vibe coding and agentic engineering — vibe coding “raises the floor” so anyone can build software, while agentic engineering is about keeping professional quality bars intact as teams orchestrate dozens of flaky but powerful agents in parallel.
His most durable takeaway is human understanding still matters even if thinking is outsourced — the quote he keeps revisiting is “you can outsource your thinking but you can’t outsource your understanding,” meaning people still have to know what they’re building, why it matters, and how to direct the agents.
The Breakdown
December Was the Moment Coding Agents Stopped Feeling Like Toys
Karpathy opens with a line that catches the room: he’s “never felt more behind as a programmer.” He says that around December, the latest models crossed a threshold where code didn’t just come out mostly right — it kept coming out right, enough that he stopped correcting it and started “vibe coding.” Berman jumps in hard here, saying this matches exactly what people on the frontier felt as agentic coding went from snippets to entire working apps.
Software 3.0: The LLM as the Computer
The next big idea is Karpathy’s old framework: software 1.0 is handwritten code, software 2.0 is training neural nets, and software 3.0 is programming through prompts and context. He describes the LLM like the new CPU and the context window like RAM, with prompting becoming the way you steer the machine. Berman reinforces that this is not just “better software” — it’s a different computing paradigm.
Why Agent-First Products Look So Weird
Karpathy uses OpenClaw as the concrete example: instead of a giant installation bash script, the install is just a block of text you paste into your agent. That’s the whole point — stop micromanaging exact steps, define the outcome and let the model inspect the environment, debug, and adapt. Berman connects this to products like Here Now and Journey Kits, where the most important UI element is basically “copy prompt for my agent.”
The Menu App Story Shows the End-to-End Neural Net Creep
Karpathy tells a great builder story: he made a menu app the old way with OCR, generated images, rendering logic, and hosting infrastructure — then saw a software 3.0 version that simply handed a photo to Gemini and told it to use “Nanobanana” to overlay items directly into the image. His reaction is basically: this app shouldn’t even exist in the old form. For Berman, this is the “bitter lesson” in action — never bet against end-to-end neural networks swallowing more of the stack.
Verifiability Explains Why AI Is Brilliant and Also Ridiculous
This is the heart of the video. Karpathy says traditional computers automate what you can specify, while LLMs automate what you can verify, and that’s why models become “jagged entities” with spikes in domains like math and code. Berman gives the memorable contrast: a model can refactor a massive codebase or find zero-days, then completely whiff on something like whether to walk or drive 50 meters to a car wash.
Founders: The Labs Will Eat the Obvious Verifiable Markets
When asked what founders should build, Karpathy’s answer is subtle but brutal: if a domain is highly verifiable, the frontier labs can eventually move into it fast, even if they aren’t targeting it today. There’s still room if you can create your own RL environments and fine-tune aggressively, but his tone suggests the obvious coding/math-like categories are not safe little niches. Berman lingers on Karpathy’s even bigger claim — that almost everything may become verifiable eventually.
Vibe Coding Raises the Floor; Agentic Engineering Raises the Ceiling
Karpathy gives the cleanest distinction in the whole talk. Vibe coding lets anyone build and experiment, but agentic engineering is the discipline of using spiky, stochastic agents without lowering the quality bar expected in professional software. Berman extends that with examples like Peter Steinberger running dozens or even 100 agents in parallel across coding, bugs, deployment, and PR management.
Taste, Ghosts, and the Agent-First Internet
In the final stretch, Karpathy says today’s agents are still like interns: powerful, but humans remain responsible for oversight, aesthetics, judgment, and taste. He revisits his “animals versus ghosts” framing to argue these systems aren’t creatures with intrinsic motivation so much as strange summoned intelligences shaped by data and reward functions. Then he lands on the line Berman clearly wants to end on: “you can outsource your thinking but you can’t outsource your understanding,” which becomes the video’s real thesis about surviving the agent era.