Back to Podcast Digest
AI Engineer··1h 18m

Skill Issue: How We Used AI to Make Agents Actually Good at Supabase — Pedro Rodrigues, Supabase

TL;DR

  • Supabase improved agent performance with skills, not just tools — Pedro Rodrigues says the “secret sauce” for making agents useful on a multi-product platform like Supabase has been skills: markdown-based workflow/context packages that complement MCP rather than replace it.

  • The big idea is progressive disclosure — instead of dumping full docs or every tool into context, a skill exposes just its front matter first, then lets the agent pull the rest only when needed, which Pedro compares to a “book” where skill.md is the index and referenced files are chapters.

  • The live demo showed exactly why generic coding agents miss product-specific gotchas — Claude successfully created a PostgreSQL view for department stats, but forgot security_invoker, which meant the new view bypassed row-level security and exposed HR data to everyone, including “Alice.”

  • A tiny skill changed the agent’s behavior in the right direction — after installing a Supabase security skill via Vercel’s skills package, the same prompt led Claude to generate the view with the security_invoker flag, proving the extra context was actually steering the model.

  • Pedro’s practical architecture is MCP for access, skills for guidance — his recommendation for things like querying large databases is to use MCP tools for the actual authenticated integration, then use skills to tell the agent how to use those tools, including chunking and progressive loading.

  • Evals matter, but writing them is trickier than writing the skill — he demoed an eval pipeline based on OpenAI’s and the Agent Skills open standard’s ideas, then immediately hit a false result because the grader checked the wrong metadata, underscoring that nondeterministic agent testing is still early and fragile.

The Breakdown

A workshop rebrand, and why Pedro cares about “DAX”

Pedro opens by joking that he rebranded the talk from “Skill Issue” to “Level Up Your Skills,” saving the spicier title for his keynote the next day. He introduces himself as an AI tooling engineer at Supabase focused on making the product “agentic friendly,” framing his job not as DX but “DAX” — developer experience for agents.

What skills actually are, beyond the skill.md file

He quickly grounds the room: skills are folders with instructions, reference files, and sometimes scripts for repeated workflows. The important design pattern is progressive disclosure — the agent first sees just the front matter, especially the name and description, then decides whether to load the rest, which Pedro describes as a book where skill.md is the “index on steroids.”

Skills vs. MCP: stop treating them like rivals

Pedro spends time on a common misconception: skills and MCP are not substitutes in a winner-take-all battle. His rule of thumb is simple: use MCP for integrations and remote/authenticated actions, especially if the agent can’t rely on local bash; use skills to provide the workflow, extra context, and operating instructions that don’t fit in tool descriptions.

Testing markdown sounds weird, but the answer is evals

He shifts into testing, arguing that a skill file can be tested much like code — unit-style, integration-style, or end-to-end — except now there’s an LLM in the loop. Citing OpenAI’s blog on systematically evaluating agent skills, he lays out an eval-driven loop: define metrics, write the skill, test manually or automatically, grade behavior, then iterate.

The demo app: a fake HR dashboard with real security mistakes

The workshop app is a small Next.js performance review system built on Supabase, with four employees and a reports page the agent needs to implement. The task sounds simple — create a department stats view showing headcount and average salary — and Claude, using Supabase’s local MCP server with about 20 tools, handles the schema change and UI wiring without much drama.

The “looks right” failure: Claude ships a data leak

Then Pedro does the human thing every demo needs: he logs in as different users. As HR, the reports look fine, but as Bob the engineering manager he can still see HR and product data, and even worse, regular employee Alice can too — because PostgreSQL views bypass row-level security by default unless you set security_invoker, a subtle product-specific rule the model missed.

A Supabase security skill nudges the model the right way

Pedro installs a prewritten skill called something like Supabase security using Vercel’s skills package, and points out a small prompt-engineering trick: descriptions starting with the verb “use” seem more likely to get loaded by Claude. Running the same task again, the model now includes the security_invoker flag in the generated SQL, which is the point of the exercise even if the rest of the messy live app still needs troubleshooting.

Production reality: evals are essential, and still messy

To close, he demos a lightweight eval harness inspired by the Agent Skills open standard: an eval.json, two conditions (with and without the skill), a reset script, and headless Claude runs. The punchline is very honest — his grading logic checked the wrong thing, so the eval reported the opposite of what the manual test showed — and Pedro uses that failure to make the real point: the hard part isn’t just writing skills, it’s defining representative scenarios and trustworthy graders for nondeterministic systems.

Share