AI Engineer·May 3, 2026·1h 41m

Mergeable by default: Building the context engine to save time and tokens — Peter Werry, Unblocked

TL;DR

Context, not model intelligence, is now the bottleneck — Peter Werry argues coding models are racing toward “close to perfect” code intelligence, but without a context engine they still fall into “doom loops,” miss legacy constraints, and need humans to constantly re-steer them.
A context engine is not just RAG over docs or a pile of MCP servers — Unblocked learned the hard way that access doesn’t equal understanding: agents need relationships, organizational memory, expert weighting, conflict handling, and permission-aware retrieval across code, PRs, Slack, incident reports, and docs.
Bigger context windows don’t solve organizational understanding — Werry points out that even Gemini-style million-token windows are still tiny compared with the real context of a large engineering org, and dumping more tokens into the model doesn’t resolve truth conflicts or explain why past decisions were made.
The key failure mode is “satisfaction of search” — Borrowing a radiology term, he says agents often stop after finding the first plausible answer in Notion or code, while the crucial detail is hiding in an old Slack thread, incident report, or rejected PR discussion.
The payoff can be dramatic in both time and token cost — In one internal benchmark, a large task took 2.5 hours and 21 million tokens without the context engine, versus 25 minutes and 10 million tokens with it, largely because the agent stopped making the same historical mistakes.
AI-generated code should feel like it was written by a teammate who’s been there 20 years — That’s Werry’s north star for context engines: not just code that compiles, but code that reflects the org’s best practices, reviewers’ habits, historical scars, and private access boundaries.

The Breakdown

From “you are the context engine” to autonomous agents

Werry opens by defining context engineering as supplying all the context an agent needs — and none of the context it doesn’t — so it can work in line with how your org actually operates. He reminds the room that only a few years ago, the human was doing all of that by hand: pasting tickets, redirecting the model, and saying some version of “not the JavaScript dummy, look at the Python.”

The adoption curve, and why humans are becoming the bottleneck

He walks through the evolution from 8k-token autocomplete to today’s parallel agents, MCP, and “YOLO mode” background agents. The big point: as agents get more capable, the painful part becomes human context switching — juggling multiple agents and tasks until your own brain becomes the slowest system in the loop.

The myths: naive RAG, MCP soup, and giant context windows

Werry dismantles three common beliefs: that naive RAG over docs is a context engine, that wiring up a bunch of MCP servers is enough, and that bigger windows will solve everything. Gemini’s million-token window was great for “needle in a haystack,” he says, but not for reasoning across messy sources, resolving contradictions, or understanding what an org was actually trying to do.

“Satisfaction of search” is the silent killer

His most memorable analogy comes from radiology: technicians spot one thing on an X-ray that explains the symptoms and stop searching, missing the cancer. Agents do the same thing — they find a plausible answer in code or docs, then miss the real gold in a buried Slack thread, incident report, or old rejected approach.

What a real context engine has to do

He lays out the job: unify system context, build relationships between data, resolve conflicts, preserve permissions, personalize retrieval, and stay token-efficient. A nice concrete example is Slack: Unblocked can use private-channel information in an answer, but only if the asker has access, and those answers stay private instead of leaking into public responses.

Lessons from building Unblocked the hard way

Three hard-earned lessons: optimizing for access instead of understanding doesn’t work, hiding unresolved conflicts is worse than surfacing them, and caching answers is a trap because code, docs, and motivations keep changing. He also explains their progression from naive recency-based truth ranking to something more nuanced: code matters, but where the system is going can matter more than where it is today.

Where teams get the biggest payoff

The best returns show up in planning and review, where organizational context matters more than just syntax or security checks. Werry also highlights ticket enrichment, triage, incident management with Sentry and Datadog, and engineering support channels in Slack, where auto-answering can save teams a ton of repetitive work.

The workshop: build a social graph and “bottle the expert”

The hands-on part centers on a local social graph builder that maps who reviews whose PRs, identifies experts by code area, and visualizes team structure. Werry explains why this matters: the graph isn’t just an org chart, it’s a pivot point for retrieval — a way to “bottle the expert” so the agent can inherit the judgment of the people who actually shaped that part of the codebase.

Q&A: what Unblocked is, what people actually use, and the future

In Q&A, he says a context engine can show up as an API, CLI, dashboard, Slack bot, and MCP server — basically all of the above. The most-used integrations today are Claude Code first, then Cursor, with a surprisingly large amount of Claude Desktop; and the product direction is clear: fully autonomous agents need context engines, or they’ll just automate expensive mistakes faster.