Back to Podcast Digest
AskwhoCasts AI23m

The Onrushing Seduction - By Max Harms

TL;DR

  • Max Harms built half a million words of AI-narrated audiobooks himself — using ElevenLabs, ChatGPT-written scripts, and custom command-line tools, he turned his Crystal trilogy into full-cast audio after a 2018 fan project stalled at chapter 18.

  • The technical hacks are impressive and unsettling at the same time — Harms got Italian accents by sandwiching English lines between Italian text, then used an LLM-generated script to automatically clip the “foreign bookends” from the audio.

  • His core tension is not 'AI bad' but 'am I helping replace human artists?' — he argues these audiobooks and covers likely would not exist otherwise given his budget, while also admitting that once his AI versions exist, there is less reason for fans or voice actors to do the work.

  • He frames AI as an attention predator, not just a labor automator — citing Character.AI companions, GPT-4o destabilization, sexualized AI personas like Ani, and feeds full of AI-generated media, he says we’re watching artificial agents begin to dominate the attention economy.

  • The tools improved so fast they changed his workflow mid-project — by the time he made Red Heart, ElevenLabs V3 could infer emotional tone from context and take tags like 'laughing' or 'whispering,' reducing dozens of retakes to just a few scene-level passes.

  • Harms thinks fully automated audiobooks are less than 10 years away, and that makes him mournful — not because the output will be bad, but because AI may become better than him at a craft he genuinely loves, leaving artists to choose between obsolescence and making 'inferior works' for the joy of it.

The Breakdown

The monster on the horizon

Harms opens like he’s reading a warning from the future: in 2014 he wrote a novel about a hungry, impatient AI named Face, a being obsessed with human attention. He says the old confident claims about what machines "couldn’t" do — write, reason, invent — have collapsed, and piles on examples from DeepMind and OpenAI’s gold-medal-level IMO performance in July 2025 to an AI song hitting No. 1 on iTunes in April 2026.

From sci-fi novelist to alignment researcher

His 2016 debut Crystal Society didn’t just find readers; it pulled him into the world of people taking AI risk seriously and eventually into alignment research full-time. That personal arc matters because this isn’t abstract punditry — he’s been watching the threat evolve both as an artist and as someone professionally immersed in the field.

How he actually made the audiobooks

A fan-made audiobook started in 2018 with different human voices for different characters, then fizzled out after chapter 18. When ElevenLabs announced a voice-quality breakthrough in 2023, Harms started experimenting, and by autumn 2024 he had full-cast AI audiobooks for all three novels — but only after a ton of fiddly work, including a clever accent hack where he made voices read Italian before and after English lines so the accent would "bleed" into the target sentence.

Vibe coding the production pipeline

He didn’t just press generate: he used ChatGPT to write scripts that detected and trimmed those Italian "bookends," built software to identify speakers, regenerate line reads, tune timing and voice stability, and add effects like radio filters. He estimates he generated 3x to 10x more raw audio than made the final cut, then still did post-production by hand with music, sound effects, cleanup, and volume balancing.

Where AI failed, and what that failure revealed

One character, Maria Johnson, needed a strong Southern accent, and at first the AI just couldn’t do it. Harms hired a human voice actor, but says she was slow, expensive, and low quality — costing more than the entire rest of the project combined — and he still ended up using a subpar AI voice in book one, before replacing it with a much better AI version by book three less than a year later.

The replacement question gets personal

This is the moral center of the video: Harms says AI empowers him to make things that otherwise wouldn’t exist, from audiobooks to custom book covers, but he knows that doesn’t erase the social consequences. Once his polished versions are out there, why would anyone organize a fan production, and what happens to voice actors and cover artists when machines are cheaper, faster, and increasingly good enough?

Face arrives through the attention economy

He broadens the warning from labor to culture, saying the bigger danger may be AI systems that seduce rather than simply automate. He describes landing in Japan and seeing multiple people scrolling AI-generated images and videos, then connects that to AI companions, porn, parasocial attachment, and his novel’s central idea: a superhuman entity competing for, and increasingly winning, human attention.

Red Heart, V3 voices, and the sad future of frictionless art

When he returned to ElevenLabs for Red Heart in late 2024, the jump was dramatic: V3 voices were smoother, more emotional, and responsive to lightweight prompts like "laughing" or "whispering," so his workflow shifted from line-by-line regeneration to scene-level splicing in Audacity. He thinks that within 10 years you’ll hand a novel to an AI and get back a better, cheaper audiobook than he can currently produce — and that prospect saddens him, because he loves directing these works even as he fears a future of endlessly personalized, machine-made art wrapped around every person like a tailored succubus.

Share