
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
OpenAI’s new voice stack is already app-worthy, not just demo-worthy — Ray builds a working live translator around GPT-Realtime-2 and GPT-Realtime-Translate in one Codex prompt, then gets English speech playing back in Spanish, Italian, Dutch, Russian, Japanese, and more with roughly half-second latency.
The real unlock is 70 input languages into 13 spoken outputs — he keeps hammering the use case: hear nearly any conversation back in your preferred language through AirPods, with live captions underneath, which he jokes makes every multilingual nail salon suddenly “not safe.”
Codex does most of the engineering heavy lifting here — starting from OpenAI’s own “open this prompt in Codex” template, Ray adds translation support, uses extra-high reasoning plus ref tools/MCP, and gets a minimal WebRTC app running locally in about 5 minutes.
The economics are shockingly low for prototyping — after multiple live tests across several languages, his OpenAI dashboard shows about $0.24 partway through and $0.54 by the end of the stream, making the translator feel cheap enough to iterate on casually.
Accent and multilingual handling are good enough to invite real scrutiny — chat listeners identify the Portuguese output as Brazilian, the Italian as sounding northern/Milan-ish, and the Russian as having some American tint when he mixes in US company names and technical terms.
Ray frames this as part of a bigger tooling shift toward OpenAI — he goes on a mini-rant about leaving Anthropic due to pricing and usage frustration, then gushes over Codex’s new Chrome extension, browser control, and parallel tool use as signs he’s becoming “Codex filled.”
Ray opens less like a product reviewer and more like someone who just got handed a superpower: OpenAI has released GPT-Realtime-2, GPT-5-class reasoning, and GPT-Realtime-Translate. His instant mental model is hilariously specific and memorable — growing up around multilingual nail salons in San Jose, he wants to walk in with AirPods and finally know what everyone’s saying about him.
He plays OpenAI’s own realtime demo and is visibly impressed by how fluid the interaction sounds, especially when he code-switches languages mid-conversation and the model doesn’t get confused. The big caveat he notices: the demo itself can’t browse or hook into the internet, but he treats that as a developer problem, not a model limitation.
Before building, Ray walks through the Realtime Playground and points out the practical bits: audio controls, system prompts, default voices, and model selection. He highlights that OpenAI will even generate the system prompt for you, which he frames as the identity layer for voice agents — the thing that makes a meeting scribe act like a meeting scribe instead of a generic chatbot.
Ray grabs OpenAI’s starter prompt, pastes it into Codex, adds one line asking for translation support with the GPT-Realtime-Translate model, and lets it rip on “extra high” with ref tools, MCP, and Exa Code. The whole vibe is: don’t overcomplicate this — use the official prompt, give Codex the docs, and let it assemble the minimal WebRTC app while you sip from a comically oversized OpenAI Dev Day water bottle.
Mid-build, he notices OpenAI has shipped a new Codex Chrome extension and gets genuinely hyped, calling himself “AGI pilled.” That spins into a broader rant: he’s moving his workflow away from Anthropic because he’s tired of paying hundreds per month and still hitting limits, while Codex feels better at browser and computer control, especially on Mac accessibility trees.
Once Codex finishes, the only missing piece is his OpenAI API key; after dropping that into the env file and relaunching the app on localhost:3000, it works. The first moment that really lands is simple: he speaks entirely in English, hears it come back in Spanish, and basically stops to marvel that the translator app just worked in one shot.
From there the stream turns into crowd-sourced QA: Portuguese sounds Brazilian, Italian sounds northern, Dutch gets a thumbs-up from native listeners, Russian sounds good but picks up some American coloration, and Japanese gets praise too. He keeps checking usage like someone expecting a jump-scare, but instead sees tiny numbers — around 7 cents for an early conversation, 24 cents midway, and 54 cents total by the end.
With the translator proven out, Ray shifts into UX mode: how should this feel on a phone, what belongs in thumb reach, what should stay passive up top, and how do you make it “baby simple”? He starts having Codex redesign the single-page app with a darker glassy interface and talks about deploying it locally over Tailscale so he can literally walk around with the translator on his phone and AirPods.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.