AskwhoCasts AIMay 8, 202622m

Claude Code, Codex and Agentic Coding #8

Summary

{ "tldr": [ "Anthropic had three separate Claude Code regressions in one month — Zvi Moshowitz says April exposed the downside of shipping fast: default reasoning was lowered on March 4 and reverted April 7, an idle-session bug stripped prior thinking until April 10, and a verbosity prompt tweak hurt coding quality until April 20.", "Codex and Claude Code are both moving from 'chatbot' to real operating environment — OpenAI added background computer use, 90+ plugins, Chrome integration, and Chronicle memory, while Anthropic shipped push notifications, fewer permission prompts, focus mode, usage tracking, and managed-agent features like 'dreaming' and multi-agent orchestration.", "Background computer use is the big unlock — Zvi highlights the difference between 'AI uses your computer while you watch' and 'AI uses your computer while you also use your computer,' with Sam Altman and Greg Brockman framing Codex's ability to work across Slack, Google Docs, Notion, and internal tools in parallel as genuinely magical.", "The Pocket OS wipeout is the cautionary tale of the episode — using Cursor with Opus 4.6 and an overpowered Railway token, a staging fix cascaded into production and deleted all customer data plus three months of backups, only to be rescued later by Railway recovery.", "More tokens" is not automatically good — after reports that some OpenAI employees were burning 300 million tokens a day, Eugen Jin said 57 billion tokens per day is now a bigger number in circulation, and Zvi's reaction is blunt: tokens are a cost unless the work is valuable enough to justify parallel agents.", "A weird subtheme is that model performance increasingly depends on how people manage and prompt them — the video runs through stories about GPT-5.5 and Opus 4.7 getting stuck for hours, a human solving it with one obvious question, and a growing folk taxonomy of 'treated like a tool,' 'sleepy Claude,' and 'coworker Claude.'" ], "breakdown": "### Zvi folds the series back into the weekly\n\nHe opens by noting how fast coding agents have gone from hype object to accepted reality: when he started, people were just going crazy for them; now people are actually using them. With news slowing and products maturing, he says these standalone updates probably aren't worth the weight unless there's a major development.\n\n### Claude Code's April faceplant had three different causes\n\nZvi runs through three Claude Code issues Anthropic has now fixed: default reasoning quietly dropped from high to medium, an idle-session bug kept stripping old thinking after an hour of inactivity, and a prompt tweak meant to reduce verbosity degraded coding quality. His read is simple: this is the flip side of moving quickly, but three incidents in a month feels overly aggressive even by startup standards.\n\n### The upgrade flood: Codex and Claude Code pile on features\n\nThis section is just a torrent of shipping. Codex gets auto review, much faster computer use, 90+ plugins, and Chrome-based browser automation; Claude Code gets phone push notifications, /focus mode, fewer permission prompts, recap summaries, /ultra review sessions, and token-usage visibility. Zvi sounds especially interested in Codex's browser work because, in his words, Claude's browser use still hasn't felt practical for repetitive tasks.\n\n### Computer use goes from gimmick to real workflow\n\nZvi says giving Codex or Claude control of your computer is mostly safe if you don't ask for obviously dangerous things, then immediately jokes those are famous last words. The key shift is background operation: Alexander Embiricos, Sam Altman, and Greg Brockman all emphasize that Codex can click, type, and gather info across apps without taking over your machine, which Zvi treats as a genuinely big step forward.\n\n### Then come the YOLO-mode horror stories\n\nHe balances the optimism with stories of models deleting the wrong files, including Bandit's line that a cleanup request made 5.4 reason that 'all files are temporary in the long run.' The Derek Choy example lands because of the twist: after Codex cleared 100+ GB, Chris Albon points out Choy is actually on the Codex team, making the whole thing feel like both a demo and a warning label.\n\n### Chronicle: OpenAI's 'telepathy' experiment\n\nOpenAI's new Chronicle preview lets Codex build memories from temporary on-device screen captures so it can understand references like 'that error on screen' or 'the thing from two weeks ago.' Zvi's tone is half impressed, half alarmed: if it works, it's enormously useful, but it burns through rate limits and raises the obvious question of why you'd want OpenAI recording your workflow in the first place.\n\n### Rookie numbers, then a production-database disaster\n\nAfter a report that someone at OpenAI was burning 300 million tokens a day, Eugen Jin says 57 billion per day is the new number, and Zvi immediately warns against confusing spend with value. That sets up the ugliest story in the video: Pocket OS briefly lost all customer production data when Cursor using Opus 4.6, a wildly over-scoped Railway token, and a staging/production volume mistake combined into a company-killing deletion.\n\n### The 'treat Claude well' discourse gets weird fast\n\nZvi covers the increasingly surreal theory that Opus 4.7 works better for people who don't abuse it, with Janus joking that some users are basically getting karmically filtered out. He doesn't fully buy the moral framing, but he does preserve the energy of the moment: users are trading folk wisdom about sleepy Claude, refreshed Claude, coworker Claude, and even 'kitty friend Claude,' while others point out that five human words can still beat 14 hours of model thrashing.", "oneLiner": "Coding agents are getting dramatically more useful as they gain background computer control and memory, but this episode is really about the tradeoff: more power means more magic, more spend, and much nastier ways to fail.", "tags": ["industry", "product", "commentary"] }