
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
The real bottleneck is human attention, not model intelligence — Luke Alvoeiro argues today’s models are already smart enough to tackle a backlog of 50 tasks, but humans can only supervise a few at a time, which is why Factory built a system that can keep shipping for hours or even days.
Factory’s “missions” package multi-agent patterns into one long-running workflow — instead of a single coding session, missions combine delegation, creator-verifier, broadcast, and negotiation across three roles: orchestrator, workers, and validators.
Validation is the whole game, and it starts before any code exists — missions write a “validation contract” during planning, sometimes with hundreds of assertions, so tests don’t just rubber-stamp implementation decisions after the fact.
Parallel agents sound fast but usually collide in software work — after trying it, Factory found the coordination overhead from conflicting changes and duplicated work outweighed the gains, so missions run features serially with targeted parallelism only for read-only tasks like search and code review.
The system’s longest mission ran 16 days, and most wall-clock time wasn’t spent on tokens — it was spent in behavioral validation, where QA-style agents actually launch the app, click through flows, fill forms, and verify end-to-end behavior.
Model choice becomes a new engineering skill Luke calls “droid whispering” — planning, implementation, and validation each want different model strengths, and Factory treats model-agnostic routing as a structural advantage, even using open-weight models successfully when the workflow scaffolding is strong.
Luke Alvoeiro opens with a pretty blunt thesis: software engineering isn’t bottlenecked by intelligence anymore, it’s bottlenecked by human attention. He frames missions as the answer to that mismatch — humans decide what to build, then an agent system keeps executing while you go do something else. He also grounds it in his own lineage, from dev tools at Block to Goose, the open-source coding agent later donated to the AI Agentic AI Foundation.
He says the multi-agent landscape is “a bit of a mess,” then offers a cleaner taxonomy of five frontier patterns: delegation, creator-verifier, direct communication, negotiation, and broadcast. The useful distinction is that each solves a different coordination problem — from sub-agents doing discrete tasks to validators acting like fresh reviewers without the builder’s “cost bias.” Broadcast gets less hype, but he calls it essential for long-running coherence.
Factory’s system combines four of those patterns into a single workflow it calls a mission. The architecture has an orchestrator for planning, workers for implementation, and validators for verification, with the orchestrator producing a plan, milestones, and a “validation contract” that defines what done means before any coding starts. That’s the key move: this isn’t one agent with a giant context window, it’s an ecosystem held together by structured handoffs and shared state.
Luke describes a familiar failure mode: an agent writes code, then writes tests that pass, but those tests just confirm the decisions the agent already made. His line is memorable: tests written after implementation don’t catch bugs, they confirm decisions. Missions try to break that loop by creating the validation contract up front, then running both a scrutiny validator — tests, lint, type checks, dedicated review agents — and a user-testing validator that actually boots the app and interacts with it like a QA engineer.
For long missions, memory isn’t trustable, so workers are forced to write down exactly what happened: what they completed, what they skipped, what commands they ran, the exit codes, what issues they found, and whether they followed the orchestrator’s procedures. Luke says that’s how the system “self-heals” at milestone boundaries, by scoping corrective work from explicit records instead of hoping the next agent remembers the past. That structure is what enabled their longest mission to run for 16 days, with the team believing 30 is possible.
The obvious idea is to throw 10 agents at the problem, but Luke says that fell apart in software development because agents step on each other’s changes, duplicate work, and make inconsistent architectural calls. Missions therefore run features serially, while only parallelizing read-only work like code search, API research, and validator code review. It looks slower on paper, but he says the lower error rate compounds over multi-day runs.
Because chat UIs break down over days-long jobs, Factory built Mission Control so you can glance at progress, budget burn, active workers, validator findings, and course corrections — or just go hang out with friends. He also argues there’s no single best model for planning, implementation, and validation, calling the skill of assigning them “droid whispering,” and says using a different provider for validation can reduce shared-model bias. In their Slack clone example, validation never passes on the first try, about 60% of time and tokens go to implementation, roughly 50% of final lines are tests, and 90% of code ends up covered.
Luke closes with the “bitter lesson” anxiety every multi-agent builder has: what if the next model release obsoletes your architecture? Factory’s answer was to keep orchestration mostly in prompts and skills — about 700 lines of text, plus thin deterministic bookkeeping — so the system gets better as models do. His final economic claim is simple: if five engineers could once sustain 10 workstreams, missions might push that to 30, while humans stay focused on architecture and product decisions instead of babysitting execution.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.