Ray FernandoMay 8, 20261h 30m

GPT-Realtime-2: Building a Live Translator

TL;DR

OpenAI’s new voice stack is already app-worthy, not just demo-worthy — Ray builds a working live translator around GPT-Realtime-2 and GPT-Realtime-Translate in one Codex prompt, then gets English speech playing back in Spanish, Italian, Dutch, Russian, Japanese, and more with roughly half-second latency.
The real unlock is 70 input languages into 13 spoken outputs — he keeps hammering the use case: hear nearly any conversation back in your preferred language through AirPods, with live captions underneath, which he jokes makes every multilingual nail salon suddenly “not safe.”
Codex does most of the engineering heavy lifting here — starting from OpenAI’s own “open this prompt in Codex” template, Ray adds translation support, uses extra-high reasoning plus ref tools/MCP, and gets a minimal WebRTC app running locally in about 5 minutes.
The economics are shockingly low for prototyping — after multiple live tests across several languages, his OpenAI dashboard shows about $0.24 partway through and $0.54 by the end of the stream, making the translator feel cheap enough to iterate on casually.
Accent and multilingual handling are good enough to invite real scrutiny — chat listeners identify the Portuguese output as Brazilian, the Italian as sounding northern/Milan-ish, and the Russian as having some American tint when he mixes in US company names and technical terms.
Ray frames this as part of a bigger tooling shift toward OpenAI — he goes on a mini-rant about leaving Anthropic due to pricing and usage frustration, then gushes over Codex’s new Chrome extension, browser control, and parallel tool use as signs he’s becoming “Codex filled.”

Summary

OpenAI Drops New Voice Models, and Ray Immediately Sees the Street-Level Use Case

Ray opens less like a product reviewer and more like someone who just got handed a superpower: OpenAI has released GPT-Realtime-2, GPT-5-class reasoning, and GPT-Realtime-Translate. His instant mental model is hilariously specific and memorable — growing up around multilingual nail salons in San Jose, he wants to walk in with AirPods and finally know what everyone’s saying about him.

The Official Demo Sells Him on the Quality

He plays OpenAI’s own realtime demo and is visibly impressed by how fluid the interaction sounds, especially when he code-switches languages mid-conversation and the model doesn’t get confused. The big caveat he notices: the demo itself can’t browse or hook into the internet, but he treats that as a developer problem, not a model limitation.

Playground, Prompts, and Why System Instructions Matter

Before building, Ray walks through the Realtime Playground and points out the practical bits: audio controls, system prompts, default voices, and model selection. He highlights that OpenAI will even generate the system prompt for you, which he frames as the identity layer for voice agents — the thing that makes a meeting scribe act like a meeting scribe instead of a generic chatbot.

One Prompt in Codex to Build the App

Ray grabs OpenAI’s starter prompt, pastes it into Codex, adds one line asking for translation support with the GPT-Realtime-Translate model, and lets it rip on “extra high” with ref tools, MCP, and Exa Code. The whole vibe is: don’t overcomplicate this — use the official prompt, give Codex the docs, and let it assemble the minimal WebRTC app while you sip from a comically oversized OpenAI Dev Day water bottle.

Codex Tangent: Browser Control, Chrome Extension, and Why He’s Switching

Mid-build, he notices OpenAI has shipped a new Codex Chrome extension and gets genuinely hyped, calling himself “AGI pilled.” That spins into a broader rant: he’s moving his workflow away from Anthropic because he’s tired of paying hundreds per month and still hitting limits, while Codex feels better at browser and computer control, especially on Mac accessibility trees.

API Key In, Localhost Up, and the First Real Translation Hit

Once Codex finishes, the only missing piece is his OpenAI API key; after dropping that into the env file and relaunching the app on localhost:3000, it works. The first moment that really lands is simple: he speaks entirely in English, hears it come back in Spanish, and basically stops to marvel that the translator app just worked in one shot.

Stress-Testing Languages, Accents, and Cost Live With the Chat

From there the stream turns into crowd-sourced QA: Portuguese sounds Brazilian, Italian sounds northern, Dutch gets a thumbs-up from native listeners, Russian sounds good but picks up some American coloration, and Japanese gets praise too. He keeps checking usage like someone expecting a jump-scare, but instead sees tiny numbers — around 7 cents for an early conversation, 24 cents midway, and 54 cents total by the end.

From Working Prototype to Product Thinking

With the translator proven out, Ray shifts into UX mode: how should this feel on a phone, what belongs in thumb reach, what should stay passive up top, and how do you make it “baby simple”? He starts having Codex redesign the single-page app with a darker glassy interface and talks about deploying it locally over Tailscale so he can literally walk around with the translator on his phone and AirPods.

Was This Useful?

LinkedIn X Email

Keep Reading

Tune your feedFive quick questions, and the feed ranks what matters to you first.

Or just get notified

The weekly Echo. Signal worth keeping in your inbox.

Every new piece, announced on X.

Follow @alcreon on X

GPT-Realtime-2: Building a Live Translator

Summary

OpenAI Drops New Voice Models, and Ray Immediately Sees the Street-Level Use Case

The Official Demo Sells Him on the Quality

Playground, Prompts, and Why System Instructions Matter

One Prompt in Codex to Build the App

Codex Tangent: Browser Control, Chrome Extension, and Why He’s Switching

API Key In, Localhost Up, and the First Real Translation Hit

Stress-Testing Languages, Accents, and Cost Live With the Chat

From Working Prototype to Product Thinking

Was This Useful?

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks

Summary

OpenAI Drops New Voice Models, and Ray Immediately Sees the Street-Level Use Case

The Official Demo Sells Him on the Quality

Playground, Prompts, and Why System Instructions Matter

One Prompt in Codex to Build the App

Codex Tangent: Browser Control, Chrome Extension, and Why He’s Switching

API Key In, Localhost Up, and the First Real Translation Hit

Stress-Testing Languages, Accents, and Cost Live With the Chat

From Working Prototype to Product Thinking

Was This Useful?

Make Alcreon Yours

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks