Matthew BermanMay 1, 202634m

Why AI Is Brilliant and Stupid

TL;DR

Karpathy says December was a real inflection point for coding agents — he went from fixing AI-generated code to “I can’t remember the last time I corrected it,” which Matthew Berman frames as the moment frontier users felt models plus tool harnesses become end-to-end useful.
The core shift is from specifying steps to verifying outcomes — Karpathy’s line is that traditional software automates what you can specify in code, while LLMs automate what you can verify, which explains why code and math have improved so quickly.
“Software 3.0” means prompting an LLM like it’s the computer itself — building moves from writing explicit rules or training task-specific models to steering a general model with context, with the LLM acting as CPU and the context window as RAM.
AI’s brilliance and stupidity come from jagged intelligence, not general intelligence — the same model that can refactor a 100,000-line codebase or find zero-days can still fail common-sense questions like whether to walk 50 meters to a car wash, because labs heavily reward verifiable domains.
Karpathy’s founder advice is blunt: verifiable domains are tractable, but labs will likely absorb the obvious ones — if a problem can be turned into strong RL environments and easy checks, startups may still fine-tune successfully, but they’re also building where foundation-model companies can move fastest.
Vibe coding raises the floor; agentic engineering raises the ceiling — anyone can now build rough software with AI, but professional teams still need taste, orchestration, and quality control, with Karpathy comparing today’s agents to brilliant but unreliable interns.

Summary

December Was the “Oh, This Is Different” Moment

The conversation opens with Karpathy admitting he’s “never felt more behind as a programmer,” which lands because it’s Andre Karpathy saying it, not some random hype account. He describes a clear break around December: coding agents stopped being useful for snippets and started producing larger chunks that “just came out fine,” until he was trusting them enough to start vibe coding in earnest.

Software 3.0: The LLM as the New Computer

Karpathy revisits his framework: software 1.0 is handwritten code, software 2.0 is programming via datasets and learned weights, and software 3.0 is prompting a general-purpose model through context. Berman reinforces the mental model with Karpathy’s old diagram: the LLM is basically the CPU, the context window is RAM, and the surrounding tools are just peripherals around this new neural computer.

Why Agent-Native Products Feel Weirdly Simple

One of Karpathy’s examples is OpenClaw installation: instead of a giant cross-platform bash script, the “installer” is just text you paste to your agent. That’s the paradigm shift in one tiny example — don’t over-specify steps, state the outcome and let the model inspect the environment, debug, and act. Berman ties this to products like Here and his own Journey Kits, where setup instructions have shrunk into a few lines of agent-facing text.

The End-to-End Neural Net Keeps Eating More of the Stack

Karpathy tells a story about building a menu app the old way — OCR, image generation, rendering, hosting — only to see a software 3.0 version that simply gives the photo to Gemini and asks it to overlay the menu items directly into the image. His reaction is basically: that whole app “shouldn’t exist.” Berman connects this to Tesla and the “bitter lesson”: once end-to-end neural nets get good enough, hand-authored heuristics start looking like technical debt.

Verifiability Explains Why AI Is a Genius and an Idiot

This is the heart of the talk. Karpathy says LLMs excel where outputs can be verified, because training now looks like giant RL environments with clear rewards, which creates “jagged” capability spikes in domains like coding and math. That’s why a model can crush refactors and security work yet still tell you to walk 50 meters to a car wash — wildly competent in high-reward, checkable spaces, weirdly dumb at simple everyday judgment.

What Founders Should Build While Labs Race Ahead

Asked what startups should do if labs are already dominating coding and math, Karpathy says verifiable domains are still tractable because founders can create their own RL environments and fine-tune on them. But there’s a catch: those same properties also make them easy for the big labs to swallow eventually. His half-teasing, half-frustrating answer is that there are valuable RL environments people aren’t focusing on yet — but he won’t quite say which ones.

Vibe Coding vs. Agentic Engineering

Karpathy draws a clean distinction: vibe coding raises the floor so anyone can build software, while agentic engineering preserves the professional quality bar while using these “spiky,” stochastic agents to go faster. Berman loves that framing and adds examples like Peter Steinberger running dozens or even 100 agents in parallel across coding, deployment, bug-finding, and PRs — not just prompting, but orchestration as a real engineering discipline.

Agents, Taste, and the Rebuilt Internet

Near the end, Karpathy says agents today are basically intern-like: powerful, but still needing human oversight, judgment, aesthetics, and direction. He predicts a world of agent-first infrastructure and agent-to-agent interaction, complaining that docs are still written for humans when what he wants is simply “the thing I should copy paste to my agent.” He closes with the line he can’t stop thinking about: “You can outsource your thinking, but you can’t outsource your understanding,” which becomes the video’s real warning label.

Was This Useful?

LinkedIn X Email

Keep Reading

Tune your feedFive quick questions, and the feed ranks what matters to you first.

Or just get notified

The weekly Echo. Signal worth keeping in your inbox.

Every new piece, announced on X.

Follow @alcreon on X