Back to Podcast Digest
AI Engineer8m

Give Your Chat Agent a Voice — Luke Harries, ElevenLabs

TL;DR

  • Luke Harries says 2025 made chat agents the default UI, but voice is the real upgrade — he points to products like Linear, PostHog, and even gov.uk moving to chat-first interfaces, then argues voice is faster, more natural, and more accessible for users with dyslexia or limited keyboard use.

  • ElevenLabs built a new 'voice engine' because customers didn’t want to rebuild their existing agents — teams had already invested in LLM orchestration, RAG, tool calling, evals, and transcripts, so replacing the whole stack was a non-starter.

  • The new product packages speech-to-text, text-to-speech, and turn-taking into a wrapper for any chat agent — Harries says it combines Scribe for STT, V3 for TTS, emotion- and context-aware turn taking, semantic VAD, and support for thousands of voices and languages.

  • The pitch is that adding voice should be almost trivial for developers — on the server side you create a client, create a voice engine, and attach a wrapper to your existing agent; on the client side, a few lines of code add a site widget, with telephony and other channels available out of the box.

  • The demo’s big claim is 'one prompt' can convert a local chat support agent into a voice agent — ElevenLabs plans to ship a skill that analyzes your codebase, figures out how your chat agent works, and writes the wrapper code automatically.

  • Tool calling mostly stays where it already lives: inside the existing chat agent — in Q&A, Harries says that’s the whole point of the wrapper, though ElevenLabs also supports client-side and server-side tools, including front-end actions like manipulating the DOM.

The Breakdown

Chat won, but Harries says it still doesn’t feel like the future

Luke Harries opens with a blunt read on the market: “2025 was the year of the chat agents.” He jokes that companies either “died a SaaS” or went AI-first by making chat the home screen, citing the viral pattern across products like Linear and PostHog, and even gov.uk experimenting with the same interface.

Why voice changes the game

His core argument is that chat is useful, but voice is the more natural medium. It’s faster, more interactive, more accessible for people who struggle with keyboards or dyslexia, and “omni channel” in a way text isn’t — the same agent can join a Zoom call, correct bad stats in real time, or become a customer support phone line.

What ElevenLabs learned from real customers like Revolut

Harries says ElevenLabs originally set out to build the best text-to-speech models in the world, then got pulled into the much bigger problem of how companies actually build voice agents. Working with customers including Revolut customer support, they kept seeing the same stack: a voice layer for STT/TTS/turn-taking on top of an orchestration layer for LLMs, RAG, tools, and integrations.

The problem: nobody wanted to rip out the agent they’d already built

That architecture created a product insight. Some customers wanted an out-of-the-box system, but many had already spent serious time building their own chat agents, eval pipelines, and transcription workflows, and they weren’t about to throw that away just to add voice.

The early preview: Voice Engine as a first-class wrapper

That leads to the launch tease: a new product called Voice Engine, coming “in a couple of weeks.” Harries frames it as the voice-specific part of the stack turned into a clean primitive, bundling Scribe for speech-to-text, V3 for text-to-speech, advanced emotion- and context-aware turn taking, semantic VAD, and broad support for languages and voices.

The developer pitch is simplicity, not reinvention

He spends most of the demo on DX: you create a client, create the voice engine, and attach a small wrapper around your current chat agent. Once a session starts, the loop just proxies messages to the existing backend, while a lightweight client SDK adds a web widget in a few lines and can also unlock channels like telephony and contact-center style setups.

“One prompt” to turn a chat agent into a voice agent

Harries shows a local support bot and says the upcoming release will include a skill that inspects your codebase, identifies your chat agent, and writes the wrapper for you. His point is less about the exact code and more about the shift in abstraction: developers shouldn’t be stitching together raw TTS and STT anymore if what they really want is a voice-native agent.

Q&A: tool calling mostly stays where it already is

In the closing question, someone asks how tool calling works, and Harries’s answer reinforces the whole product thesis: your existing chat agent usually already handles most of it. ElevenLabs also supports client-side and server-side tools, including front-end actions like DOM manipulation, but the wrapper is designed so teams can keep their current tool-calling logic intact.

Share