
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
Anthropic thinks AI platforms are moving from “API primitives” to “outcome machines” — Angela described the trajectory from a simple completion endpoint to stateful agents with memory, tools, and infrastructure, with the long-term goal being you specify an outcome and a budget and Claude figures out the rest.
Managed agents exist because infrastructure, not prompting, is where teams actually break — Caitlyn said most teams prototype fast with Claude Code, the agent SDK, or even “a couple Mac minis,” but hit the real wall when they need secure sandboxing, transcript storage, long-running sessions, and reliable cloud execution in production.
Model hot-swapping is getting less realistic as harnesses become model-specific — Angela argued that the old “generic harness, interchangeable model” pattern is weakening because newer models have different strengths and primitives, so the real optimization now is pairing a harness with a model and hill-climbing the combination.
Tiny platform choices create huge path dependence in model behavior — The team said decisions like whether Claude uses file systems, skills, or particular tool patterns can radically change outcomes, and Anthropic has seen dramatically different eval results across harness variants for the same feature, including memory.
The best internal agent use cases are boring, high-friction workflows, not sci-fi demos — Their examples included legal reviewing marketing copy, end-to-end internal software platforms like Stripe’s “minions,” and team agents in Slack that automate shared processes with humans still in the loop.
A year from now, they want Claude to understand itself well enough to self-assemble agents — Angela’s vision is that Claude will choose the model, spin up sub-agents, and write the architecture on the fly, while Caitlyn’s complementary point was that the platform then has to “seriously scale” to support constantly running, self-reconfiguring agents.
Dan opens by framing the shift in AI platforms: GPT-3 era APIs were just prompt in, completion out, while Claude’s platform now looks more like “a Claude on a computer” with memory and tools. Angela agrees and says the through line is simple: as models get more autonomous, the platform has to add richer abstractions so users can get better outcomes with less work.
Caitlyn explains that Claude Managed Agents are built on the same core primitives Anthropic exposes directly: the Messages API, built-in tools, code execution, web search, and sandboxes. Their move was to bundle the strongest pieces into a “harness” and infrastructure layer so people don’t have to reinvent the same stack every time.
Dan describes Every’s own setup — Claude running in loops on Mac minis and in Python files — and wonders if builders should just wait for Anthropic to ship the hard stuff. Angela says that instinct is valid: Anthropic built managed agents after repeatedly standing up autonomous cloud agents internally and realizing they were done rebuilding the same painful infrastructure over and over.
Dan raises the fear directly: if his team adopts managed agents, do they lose flexibility versus a generic setup that can swap Claude for GPT or Gemini? Angela says that fear is real, but the industry is moving away from ultra-generic harnesses because newer models reward tight coupling — the best results often come from optimizing the harness-plus-model combo, not from treating models as plug-compatible parts.
One of the most interesting stretches is Angela’s point that small primitive choices can steer a model’s whole trajectory. Whether Claude leans on file systems, skills, or certain tool patterns may sound like a footnote, but those decisions can lock in distinct capabilities; she says even Anthropic’s own memory experiments showed harnesses performing “drastically differently” on evals.
Caitlyn says the quick-start UI wasn’t just for nontechnical users, but to help anyone grasp the primitives fast. The actual audience spans internal company automation and product teams building agents for customers, and the real pain isn’t usually harness engineering — it’s productionizing the thing once it works, with long-running async jobs, sandbox failures, persistent state, and scaling headaches.
Their most grounded examples are internal: a legal-review agent that pre-screens marketing copy, company-wide coding platforms, and Slack-based team agents with shared context. The key insight is that once agents move from individual productivity to team workflows, they need cloud infrastructure, shared ownership, and interfaces where humans can still review, approve, and tweak behavior.
Angela says multi-agent orchestration gets exciting when it becomes “Lego-like”: advisor/executor splits, adversarial pairs, swarms for bug hunting, and architectures tuned for deep or wide research. Looking ahead, her big bet is that Claude will get good enough at understanding itself to choose models, spawn sub-agents, and build the right architecture on the fly; Caitlyn’s answer is the practical counterpart — if that world arrives, the platform’s real job is making sure it scales without becoming the bottleneck.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.