Back to Podcast Digest
Every58m

Why We Switched From Claude Code to Codex

TL;DR

  • Every fully moved from Claude Code to Codex for day-to-day knowledge work — Dan Shipper says Codex went from “trash” 3-6 months ago to his “daily driver,” and Austin now spends roughly 80% of his workday inside the Codex desktop app.

  • The real battle isn’t chatbot vs chatbot — it’s agent operating systems for work — Dan frames Codex, Claude Code/Co-work, and similar tools as a new desktop-based “agent management interface” that becomes the primary surface for Gmail, Slack, Notion, Stripe, files, and the browser.

  • Codex won for Every because the app is faster and better organized, not just because the model improved — Austin says GPT-5.5 reached rough parity with Opus for his work, but Codex’s desktop experience, subagents, folders, persistent chats, and automation flow made the decisive difference.

  • Their highest-leverage use case is having Codex assemble work they’ve already thought through — instead of asking AI to invent strategy, Austin uses it to read meeting transcripts, Slack threads, calendars, and templates, then draft documents like a plus-one go-to-market plan that came out 80-90% done in minutes.

  • Automation is most useful when it starts dumb and reliable — Austin’s examples include end-of-day reply drafts, event run-of-show generation, email triage, and recruiting pipelines, with a final human review happening in Gmail, Slack, or Notion before anything goes out.

  • The ceiling is high, but trust still requires human judgment on metrics and outputs — when rebuilding Every’s KPI tracker in Notion, Austin found Codex could get 90-95% there, but core business metrics like MRR still had to be checked column by column because being even 3-5% off is unacceptable.

The Breakdown

From “trash” to daily driver

Dan opens with a blunt reversal: 3-6 months ago, Codex was “trash,” built for senior engineers, emotionally tone-deaf, and weirdly combative. His big claim now is that OpenAI pivoted hard after seeing what Anthropic unlocked with Claude Code: if you have a coding agent on your computer, you don’t just have a programming tool — you have a general-purpose knowledge-work machine.

The new operating system is an agent desktop

Dan’s broader thesis is that work is moving into an “agent management interface,” a desktop surface where the model becomes your way into software, files, and the internet. He frames this as a race: Anthropic has Claude Code/Co-work, OpenAI has Codex, xAI has effectively moved via Cursor, and Google will likely join with something more serious than “anti-gravity.”

Austin’s conversion story: Claude first, then Codex wins

Austin says his “agent pill moment” came in December/January, spending a week deep in Claude Code through Warp and wiring it into work and life. He initially resisted Codex because, two months earlier, it made him “feel more stupid than anything” — asking architecture questions and then basically replying “why?” when he asked for clarification — but the latest GPT model changed that, and the app experience sealed it.

Why the app matters more than people think

For Austin, the biggest differentiator isn’t just model quality; it’s that the Codex desktop app is fast, organized, and actually pleasant to live in. He contrasts it with Claude’s desktop experience by saying Codex can handle parallel tasks — like shipping a PR while drafting a go-to-market plan — without getting clunky, and now it’s the first app he opens every morning.

A real workflow: Codex as growth OS

Austin demos a folder-based setup called “Every Growth OS,” connected to Gmail, Slack, Notion, Stripe, and other company systems, with local files, project instructions, and custom reviewer agents. His favorite onboarding trick is simple: ask Codex to inspect your tools and suggest automations, then let it build things like follow-up radars, event command centers, or recruiting trackers that mostly just work.

Human review still happens outside the agent

When asked how he stays safe, Austin explains that Codex drafts inside the agent, but the final approval lives in the destination app: Slack drafts get checked in Slack, Gmail drafts get checked in Gmail, and strategy docs land in Notion or Proof for a final pass. He likes the cognitive reset of stepping out of the agentic workspace before something reaches another human.

The killer use case: assembling strategy from scattered context

Austin’s favorite example is building Every’s go-to-market plan for Plus One. Instead of asking the model to invent strategy, he asks it to gather what already exists across recorded meetings, Slack debates, templates, and launch calendars, then produce a draft; one version came out 80-90% complete in the gaps between meetings, which he says would previously have required blocking off a full day or staying up late.

Where AI is amazing — and where it still isn’t enough

The final big example is rebuilding Every’s KPI sheet in Notion so both humans and agents can act on a single source of truth. Codex can wire together Notion, Stripe, social data, scripts, and six-hour refreshes, but Austin says the last mile still matters: metrics like MRR are philosophical as much as technical, so he’s validating the system column by column because a business can’t run on numbers that are even slightly wrong.

Share