AI EngineerMay 23, 20261h 54m

Prompt to Pipeline: Building with Google's Gen Media Stack — Paige & Guillaume, Google DeepMind

TL;DR

Paige’s big warning: don’t overbuild around temporary model limitations — She argued that whole product categories people rushed into—vector databases for 8k–16k token limits, language fine-tunes, agent frameworks, even MCP servers—are getting absorbed by stronger base models and simpler abstractions like “skills,” with Med-PaLM/MedLM users now often just using Gemini plus retrieval or prompting.
Google’s pitch is a full-stack, multimodal pipeline—not just a chat model — Paige framed the last 6 weeks as a release blitz including Gemini 3.1 Flash Live, Gemini 3.1 Pro and Flash Light, Nano Banana 2, VO 3.1 Light, Lyria 3, Genie 3, Gemini Embedding 2.0, and Gemma 4, all tied together through AI Studio and code export.
AI Studio is meant to turn demos into product code with almost no translation step — In one session Paige analyzed a 5-minute YouTube dinosaur video with Gemini 3.1 Flash Light, got timestamped species notes, then hit “Get code” to export the exact setup in TypeScript or Python; she summed it up as: if it works in AI Studio, it can work in your app.
The cheapest model kept showing up as ‘good enough’ for serious multimodal work — Paige highlighted Gemini 3.1 Flash Light at roughly $0.25 per million tokens analyzed, using it for video understanding and Python-backed vision tasks like drawing bounding boxes around green Lego bricks, while still keeping those workflows in the “pennies” range.
The live demos were less about single models and more about stitched systems — Paige showed Gemini Live doing screen and camera sharing in real time, then jumped to Genie 3, which composes Nano Banana, VO, Gemini prompting, and more into a 60-second playable world—like a pink sparkly pirate squirrel hopping through Regent’s Canal with pirate-flag boats and dolphins.
Guillaume and Ian both reinforced the same thesis from different sides: prompting is becoming orchestration — Guillaume used Gemini to generate prompts for Nano Banana, VO, Lyria, and TTS to turn The Wind in the Willows into a mini illustrated audiovisual production, while Ian showed Gemma 4 running locally on phones and laptops, driving agent skills, coding tools, and even a simple game-builder app.

Summary

Paige’s origin story, and why she still bets on open source

Paige opens with a career arc that starts in the 2009–2010 scientific Python era—NumPy, SciPy, Matplotlib, scikit-learn—when, as she puts it, it still felt “kind of wild” that companies trusted open source for business-critical work. She traces that into Chevron geoscience, early GPU work, TensorFlow’s original CPU-only release, and eventually Google, GitHub, PaLM 2, Gemini, and Gemma, saying flatly: “I owe my entire career to open source software.”

Her hot take: if everyone is sprinting, it’s probably the wrong thing

Before the official demo starts, Paige drops the sharpest product lesson of the session: builders often rush to patch model weaknesses that won’t stay weaknesses for long. Her examples are brutally specific—vector DBs for tiny context windows, language fine-tunes, agent frameworks, and MCP servers—arguing that many of these get swallowed by better base models, with “skills” now replacing a lot of custom infrastructure.

The Google release blitz and the multimodal thesis

She then speed-runs a huge list of recent Google DeepMind launches: Gemini 3.1 Flash Live, Gemini 3.1 Pro, Flash Light, Nano Banana 2, Gemini Embedding 2.0, Lyria 3, Genie 3, Gemma 4 under Apache 2, and VO 3.1 Light. The throughline is that Gemini isn’t just multimodal on input; it also outputs text, code, images, interleaved image+text, and audio tokens, which is what makes the rest of the demos feel like one connected stack instead of separate point products.

AI Studio as the “if it works here, it works in your app” machine

Paige’s first live demo is delightfully practical: paste in a YouTube dinosaur video, sample frames, ask Gemini 3.1 Flash Light for a timestamped dinosaur table with fun facts, and then export the exact setup with “Get code.” She hammers on the economics too—Flash Light at about $0.25 per million tokens analyzed—and follows with a code-execution demo where Gemini uses a sandboxed Python environment to draw bounding boxes around green Lego bricks for pennies.

Build mode, live voice, and the bookshelf app that almost broke on stage

From there she moves into AI Studio’s Build feature, essentially Google’s answer to v0 or Lovable, and prompts it to create a bookshelf-cataloging app with Google login, image upload, search grounding, and a database. The first run stumbles on Firestore permissions, but that becomes part of the show: you watch the model inspect rules, patch config, and eventually produce a working “Shelf Scan AI” app that identifies books, fills in missing metadata, and persists results to the user account.

Gemini Live and the pink-sparkly-squirrel world model demo

Paige gives a quick Gemini Live demo—screen sharing, camera input, multilingual replies, even a little poem after the model counts her fingers—before switching to Genie 3, which she describes as “bonkers.” The memorable bit is her generated world: Regent’s Canal on a sunny day, dolphins in the canal, pirate flags on every boat, and a pink sparkly squirrel in a pirate hat bouncing through the scene, all generated frame-by-frame without a traditional game engine.

Guillaume turns a public-domain book into images, video, music, and voices

Guillaume takes over with the gen media stack and uses The Wind in the Willows from Project Gutenberg as the backbone for a full pipeline demo. He shows how Gemini generates structured prompts, Nano Banana keeps character consistency, VO turns chapter illustrations into animated scenes, Lyria creates chapter-specific music, and Google’s TTS model stages multi-character dialogue by encoding voice style directly in the script—his point being that Gemini is already good at prompting these models because so much of the internal training loop runs through Gemini itself.

Ian’s Gemma 4 finale: local models, agent skills, and a game that builds a game

Ian closes by speed-running Gemma 4: E2B and E4B for phones and small devices, plus 26B and 31B models for laptops or single-GPU setups. He demos Google AI Edge Gallery with on-device “agent skills,” then shows Gemma running locally through LM Studio and OpenAI-compatible endpoints, farming out SVG tasks to subagents and using OpenCode to build apps—ending with the crowd-pleasing moment where a prompt for “a game where you can build your own game” actually spits out a tiny level editor and playable triangle-on-screen prototype.

Was This Useful?

LinkedIn X Email

Keep Reading

Tune your feedFive quick questions, and the feed ranks what matters to you first.

Or just get notified

The weekly Echo. Signal worth keeping in your inbox.

Every new piece, announced on X.

Follow @alcreon on X

Prompt to Pipeline: Building with Google's Gen Media Stack — Paige & Guillaume, Google DeepMind

Summary

Paige’s origin story, and why she still bets on open source

Her hot take: if everyone is sprinting, it’s probably the wrong thing

The Google release blitz and the multimodal thesis

AI Studio as the “if it works here, it works in your app” machine

Build mode, live voice, and the bookshelf app that almost broke on stage

Gemini Live and the pink-sparkly-squirrel world model demo

Guillaume turns a public-domain book into images, video, music, and voices

Ian’s Gemma 4 finale: local models, agent skills, and a game that builds a game

Was This Useful?

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks

Summary

Paige’s origin story, and why she still bets on open source

Her hot take: if everyone is sprinting, it’s probably the wrong thing

The Google release blitz and the multimodal thesis

AI Studio as the “if it works here, it works in your app” machine

Build mode, live voice, and the bookshelf app that almost broke on stage

Gemini Live and the pink-sparkly-squirrel world model demo

Guillaume turns a public-domain book into images, video, music, and voices

Ian’s Gemma 4 finale: local models, agent skills, and a game that builds a game

Was This Useful?

Make Alcreon Yours

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks