Back to Podcast Digest
Matthew Berman16m

You NEED to try these 7 loops

TL;DR

  • Loops need just two parts, a trigger and a goal: Berman defines a loop as an autonomous agent workflow kicked off manually, on a schedule, or by an action like opening a PR, then running until a verifiable target or an LLM-judged condition is met.

  • The cleanest loop is a hard metric like page speed: His favorite example is a "sub50ms page load loop" that keeps optimizing every page, modal, and window until each loads in under 50 milliseconds, using /goal in Codex or Claude Code.

  • He launched a free loop library with concrete prompts people can copy today: Instead of talking abstractly about agents, he shares ready-to-run loops for overnight docs updates, architecture refactors, logging coverage, production error fixing, SEO-GEO audits, and full product evaluations.

  • Some of the strongest use cases chain loops together: First add comprehensive logging with a logging coverage loop, then run a nightly production error sweep that reads logs, traces root causes, fixes issues, opens a PR, and pings Slack with results.

  • LLM-as-a-judge loops work, but they are the brittle ones: Prompts like "refactor until you are happy with the architecture" or "continue until every scenario meets the quality bar" can produce useful improvements, yet they depend on model taste and judgment rather than deterministic checks.

  • The two big caveats are scope and cost: Berman says loops are bad for greenfield feature building because the model can wander, and they can burn huge token budgets, as shown by his attempt to clone Excel feature parity that ran for days before he manually stopped it.

The Breakdown

A simple slashgoal prompt can keep an AI coding agent working for 50 minutes or 12 hours straight, fixing performance, docs, logs, SEO, and product quality without waiting for a human after every step. Matthew Berman argues these autonomous "loops" are the next big pattern in AI software building, but warns they get brittle when the goal is subjective and brutally expensive when tokens run for days.

Was This Useful?

Share