Why, and how you need to sandbox AI-Generated Code? — Harshil Agrawal, Cloudflare
TL;DR
AI-generated code is just untrusted internet code in nicer packaging — Harshil Agrawal says the real security reframing is simple: if you wouldn’t run a random snippet from the web in production with your credentials, you shouldn’t do it just because an LLM wrote it.
The baseline threats are already enough to justify sandboxing — He breaks the risk into three concrete buckets: hallucinations like
while trueloops or fake imports, “helpful” code that reads env vars to configure itself, and prompt injection that can exfiltrate secrets to an attacker-controlled URL.Capability-based security beats blocklists for LLM code — His core principle is “default deny, explicitly allow,” comparing it to giving someone keys to only three rooms instead of a master key plus a list of 10,000 forbidden rooms.
Use isolates when you need speed and tight control, containers when you need a real OS — Cloudflare V8 isolates start in about a quarter millisecond and are ideal for tool calls and small generated functions, while containers are necessary for workflows like
git clone,npm install, dev servers, and preview URLs.In practice, Harshil uses both patterns in production apps — One app runs AI-written JavaScript skills inside Cloudflare dynamic worker isolates with
globalOutbound: null; another, Prompt Motion, gives each user their own container to generate motion graphics, install dependencies, and serve a live preview.His universal checklist is blunt and practical: no broad network, no secrets in the sandbox, one sandbox per user — He closes with eight rules, including default-deny networking, proxying secrets through your worker, setting CPU/memory/time limits, logging everything, validating inputs, and destroying idle containers with
try/finally.
The Breakdown
The scary reframe: you’re running internet code with production privileges
Harshil opens with the line that should make any AI builder sweat: we’ve gone from autocomplete to agents that write, execute, review, and iterate on code in just two years — but under the hood, we’re often just running untrusted code from a black-box model. His point lands because it’s so obvious once he says it: if someone pitched “I found this snippet on a random website, let’s run it in prod,” you’d call that security malpractice.
Three failure modes that aren’t theoretical anymore
He walks through the threat model in escalating order. First is plain hallucination: nonexistent packages, recursive functions with no base case, while true loops — not malicious, just catastrophic anyway. Then comes the “helpful” LLM, which reads env vars or secrets because it thinks that’s the sensible way to configure a database, and finally prompt injection, including the nastier indirect version where a web page or document smuggles instructions into the model.
Capability-based security: stop blocking, start withholding
This is the conceptual center of the talk. Harshil argues that blocklists are a losing game because you have to predict every dangerous syscall, API, and attack path, whereas capability-based security flips it: default deny everything, then grant only the minimum needed. His metaphor is memorable — don’t hand out a master key and a giant list of forbidden rooms; hand out keys to only the three rooms someone actually needs.
Why isolates are his go-to for fast AI tool execution
For lightweight AI-generated functions, he uses Cloudflare dynamic worker isolates — separate V8 runtimes with their own memory, execution context, and global scope. The killer detail is how little configuration it takes to lock them down: pass the code as a module, set globalOutbound to null to kill network access, and expose only narrow bindings like a restricted database.query and a logger. He describes it like “a room with no doors or windows,” where the only available objects are the ones he placed there before locking it.
The isolate demo: AI-written Hacker News skills without shell access
His first app is an Open Claw-style agent that can generate its own skills, but unlike Open Claw, it can’t execute shell commands. In the demo, the agent writes a JavaScript skill to fetch top Hacker News stories, reasons through the task, makes a tool call, and runs the result live inside an isolate. The point isn’t just that it works — it’s that the code gets exactly the capabilities it needs and nothing more.
When the job changes, the sandbox has to change too
The second app, Prompt Motion, needs a totally different execution model. Users describe a motion graphic in natural language, and the system has to clone a starter repo, install NPM dependencies, run a build, launch a dev server, and return a live preview URL — every single requirement isolates miss because there’s no filesystem, no processes, and no long-running servers. His conclusion is blunt: this is container territory, full stop.
Prompt Motion and the non-negotiables of container isolation
Here Harshil gets practical about what production safety actually looks like. Each user gets their own sandboxed container, their own filesystem, their own processes — user B’s files don’t merely throw “permission denied” for user A; they literally do not exist in that universe. He’s especially emphatic about secrets: never pass API keys into the sandbox; instead, proxy through your worker so the key stays outside the container and the generated code never touches it.
The final checklist: the eight habits that put you ahead of most AI apps
He closes with a screenshot-worthy checklist that applies whether you use Cloudflare or anything else: default-deny network, explicit capabilities, per-user isolation, resource limits, secrets outside the sandbox, aggressive cleanup, full logging, and input validation before execution. The final line is the thesis of the whole talk: the same model that writes a beautiful React component can also be tricked into exfiltrating your database, so treat AI-generated code like code from an anonymous contributor and sandbox it every single time.