Theo - t3.ggJune 18, 202624m

I guess we're writing loops now?

TL;DR

A single loop turned into multiple self-made subloops: Theo asked whether an agent could create a workflow that opens PRs, reviews them, fixes comments, merges them, and starts the next piece, and it actually did it for a stacked refactor while he slept.
The real bottleneck is the human glue work between prompts: He argues the important automation opportunity is not the initial coding prompt, but everything after it: running the app, checking behavior, committing, filing a PR, collecting CodeRabbit or Macroscope feedback, and sending fixes back.
Predefined agent personas still feel like nonsense to him: Theo pushes back on the trend of hardcoding roles like 'security reviewer' or 'adversarial reviewer,' saying the point of agents is dynamic context-building, not markdown theater.
Codex thread spawning made loops click for him: The key product insight was realizing a Codex thread can spin up another thread, which lets an orchestrator direct parallel work and keep polling every 5 to 10 minutes for updates or review comments.
Loops are powerful but can burn absurd amounts of tokens: One Opus workflow ran for 8 hours and spent more than 3 million tokens addressing roughly three small review comments, which is fine on a flat subscription but painful at API pricing.
Theo says expensive plans only make sense if you push them hard: He estimates about $10,000 of June inference across machines while paying for three $200 plans, framing unused weekly quota as paid-for compute that builders should actively spend on ambitious experiments.

The Breakdown

Theo let an agent design its own multi-PR workflow, then went to bed and woke up to four stacked pull requests reviewed and merged by 6:50 a.m. His takeaway is blunt: stop handholding coding agents with one-off prompts and start building loops where agents do the next step, review themselves, and keep going.