Back to Podcast Digest
How I AI30m

I let Codex run for 6 hours. Here’s what happened.

TL;DR

  • Goals turn AI from turn-taking into self-management — Claravo frames /goal in Codex as a loop of work, verify, and choose-the-next-step, instead of the usual "okay, what's next?" prompting pattern.

  • The best goals look like OKRs with guardrails — strong goals specify an outcome, verification method, constraints, boundaries, iteration policy, and stopping condition, like reducing P95 checkout latency while keeping the correctness suite green.

  • A real production bug burn-down took nearly 6 hours and ended at zero errors — in ChatPRD, Claravo gave Codex access to Sentry, had it categorize every invalid edit operation, fix root causes, replay historical failures, and systematically eliminate the whole class of issues.

  • This isn't just for engineers: inbox cleanup was the breakout demo — Codex used Gmail access over 3 hours and 52 minutes plus about 6 million tokens to read, label, unsubscribe, and reduce roughly 3,900 emails to just 68 needing human review.

  • Project management cleanup is another strong non-code use case — Claravo points /goal at a messy Linear backlog and has it cancel stale podcast tasks from already-released episodes so only future work stays open.

  • The limitation is clarity, not ambition — goals are a bad fit for one-line edits or vague asks like "make customers happy"; they work best when the objective is durable, the finish line is measurable, and the path requires multiple rounds of investigation.

The Breakdown

A single Codex goal ran for 5 hours and 45 minutes and wiped out an entire class of production edit errors — then Claravo showed the same workflow cleaning up 3,900 emails down to 68. The big idea: stop babysitting AI turn by turn and start giving it measurable outcomes with evidence-based finish lines.

Share