AI EngineerMay 8, 20261h 3m

Agentic Search for Context Engineering — Leonie Monigatti, Elastic

TL;DR

Context engineering is mostly search orchestration — Leonie Monigatti’s core claim is that context engineering is “about 80% agentic search,” because the hard part is deciding what actually makes it from files, databases, web, and memory into the model’s context window.
Vanilla agentic RAG breaks fast in the real world — a simple semantic search demo over AI Engineer conference sessions worked for “regulatory constraints,” then failed on “GDPA” by retrieving irrelevant talks like DeepMind’s Gemma, showing how brittle one-tool retrieval can be.
General-purpose tools raise the ceiling but increase failure modes — letting an agent write full Elasticsearch ESQL queries unlocked filtering and aggregations like counting 27 sessions on April 8th, but also introduced syntax mistakes like using SQL-style % instead of ESQL * wildcards.
Agent skills are a practical fix for complex tool use — by forcing the agent to load an Elasticsearch ESQL skill before querying, Leonie got GPT-5.4 mini to self-correct and find Samuel’s GDPR session at 10:40, instead of failing with zero results.
The shell tool is absurdly versatile, but not a silver bullet — with just terminal access, GPT-5.4 Nano could grep local session files, chain synonyms like “compliance,” “governance,” and “GDPR,” and often recover semantically relevant results, though in a clunky and inefficient way.
The best agent stacks mix low-floor and high-ceiling tools — Leonie recommends specialized tools for easy, reliable actions and general-purpose tools like bash or query execution for edge cases, plus logging behavior to discover when four or five tool calls per question means you need a better interface.

Summary

Why Leonie thinks search is the real engine of context engineering

Leonie opens with a spicy but memorable framing: context engineering is really about that tiny arrow between “context sources” and the “context window,” and that arrow is powered by search. Her hot take is that the whole discipline is “about 80% agentic search,” because the real challenge isn’t having context — it’s selecting the right context from files, databases, web, skills, and memory.

From old-school RAG to agentic retrieval

She quickly traces the last three years: classic RAG used a fixed retrieval pipeline, often feeding the user prompt straight into vector search whether retrieval was useful or not. Agentic RAG improved that by turning retrieval into a tool the model can call when needed, but Leonie points out that real systems still span many context sources, not just one database.

The search-tool jungle, and why shell isn’t enough by itself

From search_files and skill loaders to semantic DB search, SQL/ESQL execution, web search, memory tools, and bash/exec/shell tools, she lays out a crowded tool landscape. The shell tool gets special attention because it can touch almost everything via CLIs and curl, but her takeaway is clear: search is hard enough that there is no single silver bullet, and teams need a curated stack.

The three ways agents fail before your demo even starts

Leonie says the happy path for agentic search looks simple only on slides; in practice, agents fail by not calling any tool, calling the wrong one, or generating bad parameters. She sounds almost exasperated talking about flimsy tool descriptions — everyone knows they matter, yet people still ship one-line docs and then wonder why the agent chooses web search over the database.

A semantic-search demo that works… until it doesn’t

Using LangChain, GPT-5.4 Nano, Elasticsearch, and conference session data, she builds a minimal semantic search tool over titles and descriptions. It does fine on “Which sessions discuss regulatory constraints in AI systems?” but falls apart on “Which sessions should I visit to learn more about GDPA?”, returning irrelevant results like Gemma because semantic similarity is the wrong instrument for exact-ish keyword lookup.

Giving the agent full query power with ESQL and skills

She replaces the narrow semantic tool with an execute-query tool that lets the model write full ESQL, switching to GPT-5.4 mini because this is harder work. The first attempt still fails — the agent uses %GDPA% instead of ESQL’s *GDPA* — and that becomes her case study for why error handling and “agent skills” matter: the skill injects syntax guidance only when needed, and after loading it, the agent correctly finds Samuel’s GDPR session and later computes that there are 27 sessions on April 8th.

Searching the local file system with bash, grep, and a little model cleverness

Then she pivots to the “all you need is a shell tool and a file system” debate, storing conference sessions as local markdown-like files. GPT-5.4 Nano navigates folders, uses grep to find the GDPR session, and even does a funny pseudo-semantic hack on “regulatory constraints” by trying synonym chains like “compliance,” “governance,” and “GDPR” — effective enough to be impressive, but not exactly elegant.

Upgrading bash with semantic CLI tools and ending on a practical rule of thumb

To show how shell-based agents can be extended, she installs the Jina CLI and teaches the model when to use grep for exact matches versus jina for semantic search, which lets it find the right regulatory-constraints session on the first try. Her closing advice is the most practical part of the talk: build for a “low floor” with specialized tools, a “high ceiling” with general-purpose ones, and if you don’t know user behavior yet, start broad, log everything, and let the agent’s repeated mistakes tell you which tools to build next.

Was This Useful?

LinkedIn X Email

Keep Reading

Tune your feedFive quick questions, and the feed ranks what matters to you first.

Or just get notified

The weekly Echo. Signal worth keeping in your inbox.

Every new piece, announced on X.

Follow @alcreon on X

Agentic Search for Context Engineering — Leonie Monigatti, Elastic

Summary

Why Leonie thinks search is the real engine of context engineering

From old-school RAG to agentic retrieval

The search-tool jungle, and why shell isn’t enough by itself

The three ways agents fail before your demo even starts

A semantic-search demo that works… until it doesn’t

Giving the agent full query power with ESQL and skills

Searching the local file system with bash, grep, and a little model cleverness

Upgrading bash with semantic CLI tools and ending on a practical rule of thumb

Was This Useful?

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks

Summary

Why Leonie thinks search is the real engine of context engineering

From old-school RAG to agentic retrieval

The search-tool jungle, and why shell isn’t enough by itself

The three ways agents fail before your demo even starts

A semantic-search demo that works… until it doesn’t

Giving the agent full query power with ESQL and skills

Searching the local file system with bash, grep, and a little model cleverness

Upgrading bash with semantic CLI tools and ending on a practical rule of thumb

Was This Useful?

Make Alcreon Yours

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks