Back to Podcast Digest
Mo Bitar8m

OpenAI founder admits AI isn’t working

TL;DR

  • Andre Karpathy describes both trust and panic with AI coding — Mo Bitar zeroes in on the contradiction: Karpathy says he’s mostly stopped checking some AI-generated code, yet also admits reviewing it can give him a “heart attack” because it’s bloated, brittle, and full of copy-paste.

  • The real limitation isn’t magic intelligence — it’s training coverage — Karpathy’s explanation, as retold here, is that if a task isn’t well represented in base training data or reinforcement learning data, “there’s no force on this planet” that will make the LLM solve it reliably.

  • Agentic engineering is drifting toward spec-writing, not puzzle-solving — Instead of LeetCode, Karpathy suggests interviews should test whether someone can define and ship a large project like a secure Twitter clone, including edge cases like tokens, cookie expiration, rate limiting, and password resets.

  • Mo’s core point is that ‘small weird mistakes’ aren’t cute when stakes are high — He pushes back hard on the idea that an agent making nonsensical assumptions in an app like MenuGen is a funny anecdote, especially when similar systems are being pitched for domains like medicine and taxes.

  • The emerging skill is one-shotting agents with better instructions — Mo argues the practical advantage today is knowing what not to hand to Claude until the spec is ready, because the better you front-load the requirements, the less time you waste in back-and-forth repair loops.

  • Even top AI insiders seem unsure about where software careers are heading — By the end, Mo’s takeaway from Karpathy’s interview is less “here’s the roadmap” and more that even someone at Karpathy’s level sounds genuinely uncertain about what skills will matter most next.

The Breakdown

Karpathy’s ‘just stop checking’ moment

Mo opens with Andre Karpathy as both OpenAI founder and one of the people who helped shape the modern AI stack, then immediately homes in on a wild quote: Karpathy says he’s been using AI coding tools so much that he’s basically stopped checking the output. Mo treats it less like a productivity hack and more like a surreal confession — “that was your whole job, bro” — setting up the video’s central tension.

The contradiction: genius at refactors, clueless at strawberries

Right after that, Mo says Karpathy tells a second story that undercuts the first: these models can refactor a 100,000-line codebase, yet still fail on toy questions like counting the Rs in “strawberry” or deciding whether to drive or walk to a car wash. That contradiction is the whole mystery Karpathy himself is openly wrestling with, and Mo leans into the weirdness instead of trying to smooth it over.

MenuGen and the ‘catastrophic little mistake’ problem

Mo highlights Karpathy’s anecdote about an app called MenuGen, where the agent made an assumption around reusing emails that simply didn’t make sense. Karpathy tells it like one of those charming LLM quirks, but Mo refuses that framing: if the mistake could have been catastrophic unless caught manually, then it’s not cute — it’s the exact reliability issue critics keep pointing at.

‘Heart attack code’ and the slop underneath the hype

The sharpest moment in the recap is Karpathy’s own admission that when he actually inspects AI-written code, it can be “bloaty,” full of copy-paste, and built on awkward, brittle abstractions. Mo pounces on that line, riffing that startups may soon need a ping-pong table, espresso machine, and a defibrillator next to the pull requests. The joke lands because the underlying point is serious: the code often works, but it’s gross.

Agentic engineering starts with markdown specs

From there, Mo summarizes Karpathy’s workflow: write an extremely detailed markdown document covering exactly what the software should do, every edge case included, and let the model generate code from that. Mo’s reaction is basically: this sounds an awful lot like writing software requirements in English and hoping the machine doesn’t wander off. He also notes Karpathy’s frustration when the model insists a messy solution can’t be simplified, even when a strong engineer can plainly see the cleaner path.

RL, autocomplete, and why some tasks just won’t click

Karpathy’s explanation, as Mo tells it, keeps returning to reinforcement learning layered on top of base training data. If your task isn’t represented in those datasets, the model just won’t acquire the capability — which is why Karpathy ends up calling LLMs a very sophisticated autocomplete. Mo sees that honesty as valuable, even if it clashes with the broader accelerationist tone around vibe coding and agents.

Hiring is changing faster than interviews are

The most practical part of the video is the hiring advice: Karpathy says companies say they want agentic engineers, but still interview as if it’s 2021, using LeetCode and puzzle problems. His proposed replacement is simple and concrete: give candidates a big project, like building a secure Twitter clone, and evaluate the spec they write and the edge cases they anticipate — tokens, session length, cookie expiration, change-password flows, rate limiting, even recommendation systems.

Even Andre sounds a little lost

Mo closes on a surprisingly human note: the interviewer asks Karpathy what skills will still matter if AI keeps improving, and Karpathy doesn’t seem to have a satisfying answer. Mo’s takeaway isn’t that Karpathy is weak — it’s that the question is genuinely unresolved. If you feel disoriented about the future of software work, Mo says, there’s some comfort in knowing even Andre Karpathy does too.

Share