
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
Winning in vertical AI is mostly an organizational problem, not a model race — Chris Lovejoy argues that with frontier models now “good enough,” the real moat is how a company operationalizes expert judgment around specific workflows, especially given Gartner’s stat that about 50% of generative AI projects were abandoned last year.
Lovejoy’s core framework is three roles for domain expertise: oracle, evaluator, architect — the oracle directly tweaks prompts and product behavior, the evaluator defines measurable quality and builds the review system, and the architect designs automated feedback loops so the product improves from usage at scale.
You usually do need domain expertise, but it doesn’t always mean hiring a traditional credentialed expert — the essential requirement is judgment about what “good” looks like in your use case, which can come from formal experts like doctors and lawyers or informal experts already inside the company.
The right structure depends on whether quality is measurable and whether manual iteration is fast enough — if quality is subjective, like meeting notes, a strong oracle can work for a long time; if quality is measurable and scale outpaces human fixes, you need to evolve toward evaluator and architect systems.
His case studies show the pattern in real companies: Granola stayed oracle-heavy, Tandem decentralized the oracle, and Anterior moved through all three stages — Joe at Granola still acts as the quality gatekeeper for AI meeting notes, Tandem hired doctors across specialties and geographies for prompt customization, and Anterior built clinician review dashboards and later automated improvement for prior authorization decisions.
The practical advice is to hire a principal domain expert early and give them real ownership — Lovejoy warns against treating experts as part-time advisors or splitting authority across committees, citing a company where two senior clinicians with ambiguous ownership moved slowly and both left after 12–18 months.
Chris Lovejoy opens with a blunt claim: if you want to build better AI products, you need a “domain native AI organization.” He comes at this as a former Cambridge-trained doctor who worked in the NHS, then moved into AI at places like Tandem and Anterior, where the recurring problem was always the same — how do you actually bake expert judgment into the product?
He frames vertical AI as a massive opportunity, echoing the VC excitement around AI moving beyond software into labor itself — from a roughly $50 billion vertical SaaS market toward something much larger. But he points out the ugly reality too: Gartner says about 50% of generative AI projects were abandoned last year, and his explanation is that companies are trying to automate workflows they don’t deeply understand.
Lovejoy’s framework is the heart of the talk. The oracle is the expert who both judges outputs and directly improves the product — often by tweaking prompts, adding documents, or changing tools. The evaluator still defines what quality means, but turns that judgment into metrics and review systems; the architect goes one step further and designs the machinery for automated learning and improvement, with much less human-in-the-loop work.
His decision tree is refreshingly practical: first ask whether quality can actually be measured in a meaningful metric, or whether it’s more about taste. If it’s not measurable, you want an oracle; if it is measurable, then ask whether manual iteration by engineers is fast enough — if yes, evaluator may be enough, and if no, you’ll need architect-style automation. He also stresses that this can evolve over time, especially as a startup grows.
His first case study is Granola, the AI meeting-notes company now valued at over $1 billion. He highlights Joe, an early employee with a writer/journalist background, who wrote all the prompts and spent “many, many hours” reading papers and talking to hundreds or thousands of users to understand what makes a good meeting note. That works as an oracle model because there’s no single objectively perfect meeting note — taste matters, and the product’s core output is narrow enough for one strong quality gatekeeper to matter.
Tandem, which builds a medical AI scribe, started similarly with Roy — a doctor who had also been at McKinsey — reviewing notes and updating prompts himself. But scale broke the one-person model, so the company hired multiple doctors across specialties, countries, and note types, effectively creating a decentralized oracle system. The key detail here is the long tail: thousands of prompt variants tuned for different medical contexts, each needing someone who actually understands that slice of the workflow.
Lovejoy then uses his own experience at Anterior, a prior authorization startup, as the cleanest example of the full progression. He began as the oracle — building prompts and code, then putting on his “doctor hat” to assess whether approval decisions were clinically appropriate — but as customer variation grew, he defined metrics and failure modes, built a clinician review dashboard, and hired clinicians to produce scalable evaluation data. Even that eventually wasn’t enough, because insurers interpreted policies differently, so the product needed architect-style systems that could learn and adapt at the edge.
He closes with the management lesson: designate a principal domain expert with actual accountability for AI quality. Don’t reduce them to an advisor, and don’t split authority so broadly that nobody really owns the call — he shares a cautionary example of a company with two senior clinicians, fuzzy ownership, slow progress, and both leaders leaving after 12 to 18 months. His hiring advice is to optimize for relevant domain experience first, then stack as many adjacent skills as possible — prompting, data science intuition, product sense, leadership, even engineering — so the person can grow from oracle into evaluator or architect as the company matures.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.