Back to Podcast Digest
Theo - t3.gg··32m

The language holding our agents back.

TL;DR

  • Bash unlocked coding agents, but it’s a stopgap — Theo argues tools like Cursor, Claude Code, Codex CLI, and T3 Code got dramatically better once models could use bash to search, edit, and run code, but bash lacks the structure, safety model, and portability agents will ultimately need.

  • Stuffing the whole codebase into context is both expensive and worse — he calls out repo-mixing as “the worst possible way to ever code with AI,” saying it cost his company at least $100,000 because 100k+ token prompts are slower, pricier, and make models more random and less accurate.

  • The real leap was letting models retrieve context instead of memorizing it — a 7-token grep command that fetches 30 useful tokens beats handing over 100,000 tokens up front, because deterministic tools like grep reduce the non-determinism that grows with larger context windows.

  • Too many tools poison agent performance just like too much context does — citing Reese’s framing, Theo says agents improved when they had fewer direct tools and a single bash execution layer, because dumping hundreds of tools or MCP specs into context makes models latch onto irrelevant options.

  • TypeScript looks like a stronger execution layer than bash — he points to Cloudflare’s code mode, Vercel’s “just bash,” and Malta’s “just JS” as evidence that letting models write JS/TS in isolated runtimes can cut token usage, improve accuracy, and support safer, typed approvals and permissions.

  • The next big battleground is where agents run and what they’re allowed to do — signed-in browser state, shared approvals, role-based access, destructive-action detection, and lightweight multi-tenant isolation are still unsolved, which Theo frames as a wide-open opportunity for builders.

The Breakdown

From giant prompts to agents that can actually use your machine

Theo opens with a joke about the bad old days: asking ChatGPT what commands to run, then copy-pasting them yourself. The modern shift is tools like Cursor, Claude Code, Codex CLI, and T3 Code giving models access to your system through bash — a huge step forward, but in his words, only “a really important stepping stone.”

Why tokenization and context windows matter more than people realize

He detours into how LLMs actually work: token-by-token autocomplete, where chat history heavily shapes what comes next. His key point is that newer tokenizers are much better for code than GPT-3-era tokenization, but even with better tokenization, context is still finite and overloaded prompts make models “way dumber” as you approach the limit.

The repo-mix rant: why dumping your codebase is self-sabotage

This is the sharpest section of the video. Theo says tools that compress and paste entire codebases into context have cost his company “at least $100,000” because users fed 100,000+ token blobs into T3 Chat to get cheap coding help, resulting in slower, pricier, and lower-quality outputs. His analogy is simple: if you only need one TypeScript file, the model absolutely does not need every line of Rust in the repo too.

Bash works because it helps models discover, not remember

Instead of forcing the model to ingest everything, Theo says we should let it generate commands to find the relevant text it needs. A grep or ripgrep command is deterministic in a way the model itself is not, so the trick is using a small amount of generated text to fetch the right context, rather than hoping a huge prompt leads the model to the right place.

More tokens, more randomness — and why some labs are behind

He frames this as a spectrum from deterministic to random: console.log('hello') sits on one side, AI generation on the other, and “more tokens effectively equals more random.” That’s why he thinks OpenAI, Anthropic, and Chinese labs are ahead by emphasizing tool use, while Google over-optimized for retrieving from huge contexts — which he says is part of why Gemini still underperforms in coding workflows.

The execution layer problem is bigger than just a shell

Once Theo gets to the title thesis, the issue becomes clear: bash is too unstructured for approvals, permissions, safety, and multi-user environments. He riffs on real pain points — signed-in state shared across Cursor and OpenCode, approval fatigue that pushes everyone into “dangerously skip permissions mode,” and the inability to know whether a bash command is destructive, read-only, or even allowed.

MCP, tool bloat, and the case for code over tool specs

Theo takes another swing at MCP, saying most servers are bloated and make AI worse because their descriptions eat enormous context. He cites Anthropic’s tool-search work and Cloudflare’s code mode as better directions: convert capabilities into TypeScript SDKs the model can discover and write against, so code does the filtering instead of repeatedly bouncing huge tool outputs back through the model. In Cloudflare’s example, token use dropped from 43,500 to 27,000, with latency and benchmark accuracy improving too.

Why TypeScript sandboxes may be the future

The most forward-looking section centers on TypeScript as an execution layer: portable, typed, cheap to isolate, and runnable in environments like Node, V8, browsers, workers, or lightweight isolates. He points to Vercel’s just-bash, Malta’s just-JS, Reese’s Exeutor, and Dax experimenting with removing bash from OpenCode entirely, then closes with a call to builders: the UI for agents isn’t settled, the runtime isn’t settled, and this is one of those rare moments where “any one of us can be the one who changed it.”