
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
Context engineering is mostly search orchestration — Leonie Monigatti’s core claim is that context engineering is “about 80% agentic search,” because the hard part is deciding what actually makes it from files, databases, web, and memory into the model’s context window.
Vanilla agentic RAG breaks fast in the real world — a simple semantic search demo over AI Engineer conference sessions worked for “regulatory constraints,” then failed on “GDPA” by retrieving irrelevant talks like DeepMind’s Gemma, showing how brittle one-tool retrieval can be.
General-purpose tools raise the ceiling but increase failure modes — letting an agent write full Elasticsearch ESQL queries unlocked filtering and aggregations like counting 27 sessions on April 8th, but also introduced syntax mistakes like using SQL-style % instead of ESQL * wildcards.
Agent skills are a practical fix for complex tool use — by forcing the agent to load an Elasticsearch ESQL skill before querying, Leonie got GPT-5.4 mini to self-correct and find Samuel’s GDPR session at 10:40, instead of failing with zero results.
The shell tool is absurdly versatile, but not a silver bullet — with just terminal access, GPT-5.4 Nano could grep local session files, chain synonyms like “compliance,” “governance,” and “GDPR,” and often recover semantically relevant results, though in a clunky and inefficient way.
The best agent stacks mix low-floor and high-ceiling tools — Leonie recommends specialized tools for easy, reliable actions and general-purpose tools like bash or query execution for edge cases, plus logging behavior to discover when four or five tool calls per question means you need a better interface.
Leonie opens with a spicy but memorable framing: context engineering is really about that tiny arrow between “context sources” and the “context window,” and that arrow is powered by search. Her hot take is that the whole discipline is “about 80% agentic search,” because the real challenge isn’t having context — it’s selecting the right context from files, databases, web, skills, and memory.
She quickly traces the last three years: classic RAG used a fixed retrieval pipeline, often feeding the user prompt straight into vector search whether retrieval was useful or not. Agentic RAG improved that by turning retrieval into a tool the model can call when needed, but Leonie points out that real systems still span many context sources, not just one database.
From search_files and skill loaders to semantic DB search, SQL/ESQL execution, web search, memory tools, and bash/exec/shell tools, she lays out a crowded tool landscape. The shell tool gets special attention because it can touch almost everything via CLIs and curl, but her takeaway is clear: search is hard enough that there is no single silver bullet, and teams need a curated stack.
Leonie says the happy path for agentic search looks simple only on slides; in practice, agents fail by not calling any tool, calling the wrong one, or generating bad parameters. She sounds almost exasperated talking about flimsy tool descriptions — everyone knows they matter, yet people still ship one-line docs and then wonder why the agent chooses web search over the database.
Using LangChain, GPT-5.4 Nano, Elasticsearch, and conference session data, she builds a minimal semantic search tool over titles and descriptions. It does fine on “Which sessions discuss regulatory constraints in AI systems?” but falls apart on “Which sessions should I visit to learn more about GDPA?”, returning irrelevant results like Gemma because semantic similarity is the wrong instrument for exact-ish keyword lookup.
She replaces the narrow semantic tool with an execute-query tool that lets the model write full ESQL, switching to GPT-5.4 mini because this is harder work. The first attempt still fails — the agent uses %GDPA% instead of ESQL’s *GDPA* — and that becomes her case study for why error handling and “agent skills” matter: the skill injects syntax guidance only when needed, and after loading it, the agent correctly finds Samuel’s GDPR session and later computes that there are 27 sessions on April 8th.
Then she pivots to the “all you need is a shell tool and a file system” debate, storing conference sessions as local markdown-like files. GPT-5.4 Nano navigates folders, uses grep to find the GDPR session, and even does a funny pseudo-semantic hack on “regulatory constraints” by trying synonym chains like “compliance,” “governance,” and “GDPR” — effective enough to be impressive, but not exactly elegant.
To show how shell-based agents can be extended, she installs the Jina CLI and teaches the model when to use grep for exact matches versus jina for semantic search, which lets it find the right regulatory-constraints session on the first try. Her closing advice is the most practical part of the talk: build for a “low floor” with specialized tools, a “high ceiling” with general-purpose ones, and if you don’t know user behavior yet, start broad, log everything, and let the agent’s repeated mistakes tell you which tools to build next.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.