
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
A one-shot chatbot breaks fast in production — Mehedi Hassan says the obvious user complaints at Granola were things like “web search is too slow,” “it’s not writing follow-up emails how I normally write,” and even coaching prompts returning answers about a football coach instead of meeting coaching.
“Just add web search” is a trap — what looks like a single line of code can explode token usage, context size, and cost to around 10 pence per chat, while provider-side changes can degrade quality overnight with zero warning or control.
One prompt won’t satisfy every role — the same meeting summary that feels fine by default needs to look very different for sales, engineering, and HR, which is why Granola treats output shaping as a product problem, not just a prompting problem.
Granola built its own tracing stack to crack the LLM black box — instead of relying only on SaaS observability, they save traces to a database, wrap the AI SDK, and expose tool calls, reasoning, search steps, and cost in a UI usable by product, data, CX, and even the founder.
They sped up desktop AI iteration by turning the Electron frontend into a web shell — every PR now gets a preview link, coworkers can test variants without local setup, and Cursor even runs tests and uploads screenshots into PRs.
The real lesson is not to “one-shot better” — Hassan’s closing metaphor is that building with LLMs should feel like playing tennis with the model, tightening the feedback loop until the product feels like magic instead of a black box.
Mehedi Hassan opens by framing himself as a product engineer who has lived through jQuery, React, and now LLMs. He demos Granola as a meeting-notes app that listens to system audio and microphone audio in real time, then combines transcription with the user’s own notes so the final summary feels like an actual notepad, not just raw AI output.
He then walks through a deliberately simple chat feature: ask questions across a meeting, shared context, and recent notes, and let the model answer. The point is how quickly that falls apart in production, with users saying web search is too slow, follow-up emails don’t sound like them, and “coach me about my meetings” somehow turns into answers about a football coach.
Hassan takes aim at the idea that web search is solved by adding a provider tool in one line of code. In practice, he says, complex queries can blow up context windows and push cost to something like 10 pence per chat, and worse, providers can silently ship changes that degrade quality overnight, leaving your product exposed with no clear explanation.
He makes the same point about summaries: the “good” output depends entirely on who’s reading it. Sales wants deal focus, engineering wants action items and blockers or maybe Linear tickets, HR wants something else entirely, so one generic prompt just doesn’t cut it.
To get inside that black box, Granola built its own tracing tools. The key value wasn’t just logging tool calls and costs end-to-end — it was structuring the data and UI so product, data, CX, and leadership could inspect failures without digging through CloudWatch, and he notes their founder literally follows agent loops front to back to pinpoint what went wrong.
The next bottleneck wasn’t the model — it was product iteration inside Electron. Because only one app instance could run at a time and coworkers needed local setup to test changes, they turned the Electron frontend into a deployable web shell, abstracted IPC APIs to fall back to web standards, and made the renderer effectively Electron-agnostic.
That shift gave them web-style preview links on every PR and made it much easier to test multiple feature variants in parallel. Hassan says Cursor now opens PRs, tests changes, and uploads screenshots, which compounds the speedup and lets the team evaluate product feel in practice instead of just staring at Figma.
His final message is the title of the talk: you can’t just one-shot it. The goal is to create a fast feedback loop that feels like playing tennis with the LLM, so what ships has conviction behind it and lands with users as magic rather than as a flaky black box.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.