AI Engineer·April 21, 2026·8h 2m

AIE Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more!

TL;DR

G2I’s big claim: agent adoption is a learned workflow, not a personality trait — David House used four case studies—Ava, Lucy, Antoine, and Dale—to argue that engineers move from “slop cannon” fear to effective delegation by internalizing staged practices like product brief → tech spec → code/test/review, not by magically being “good at AI.”
Speed is becoming the bottleneck-breaker for coding agents — Cerebras’ Sarah Chiang said model quality has soared while inference speed has stayed stuck around 50–150 tokens/sec, creating “latency debt,” and framed OpenAI + Cerebras’ Codex Spark at 1,200 tokens/sec as a regime change that turns coding models into real-time pair programmers instead of lunch-break batch jobs.
Specialized sub-agents are emerging as the practical architecture for agentic coding — Morph founder Tis argued that coding agents spend most of their time on tasks that don’t need frontier models—like code search, compaction, and diff application—and claimed speed gains map directly to product outcomes, with conversion rates roughly doubling when latency is cut in half without hurting accuracy.
The conference repeatedly returned to one theme: context quality matters more than model hype — Across talks, from G2I’s document handoffs to Morph’s isolated context windows to Nia Mulin’s knowledge-graph framing, the point was the same: better models don’t fix broken delegation, broken retrieval, or broken context assembly.
On-device AI is getting weirdly real — Le Kalinowski demoed a diffusion pipeline running fully offline on a smartphone NPU in airplane mode, using ambient sensor values instead of text prompts and hitting roughly 600 ms latency, as a proof that mobile NPUs can support practical local generative UX without cloud calls.
Coding agents are escaping developer tooling and becoming general software primitives — Agentuity’s Rick Blalock traced the shift from Auto-GPT-era “orchestration theater” to 2026 reality, arguing that non-technical operators now use coding agents to replace chunks of HubSpot, manage marketing ops, and build business software directly—evidence that coding agents are not just writing software but increasingly becoming it.

The Breakdown

Day 2 opens with hype, jokes, and a very specific challenge

The MCs kicked things off like people who knew the crowd was tired but still game: asking who got drunk, who made more than five LinkedIn connections, and who counts as a “world-class engineer.” The mood was half pep rally, half support group for AI builders, with the throughline that everyone in the room should learn hard and talk to strangers because that’s how the field moves.

G2I’s David House makes agent adoption sound more like therapy than tooling

House opened with the perfect credential twist: he’s a software engineering manager with a background in mental health counseling, and he used that lens to study why engineers fear coding agents. Through stories about Ava, Lucy, Antoine, and Dale, he showed how people go from “I don’t trust this thing with my reputation” to writing their own prompts and sub-agents once they learn to encode engineering judgment into briefs, specs, tests, reviews, and constrained delegation. His core line landed cleanly: beginners need frameworks that constrain their input; experts need frameworks that amplify it.

Cerebras says coding got smarter, but not faster—and that’s been the real tax

Sarah Chiang gave one of the day’s clearest macro arguments: model quality, context windows, and reasoning tokens have all exploded, but generation speed has stayed flat at roughly 50–150 tokens per second. She called that accumulated drag “latency debt,” then used Codex Spark—released with OpenAI and running at 1,200 tokens/sec—as the proof that the stack is finally being re-architected, from hardware to model design to KV-cache tricks and disaggregated inference. The punchline was less “look how fast” and more “look how this changes behavior”: developers can stay in the loop, steer outputs live, and stop generating giant piles of unchecked code.

A physicist demos ambient AI on a phone, fully offline, with sensor-driven diffusion

Le Kalinowski’s talk had the energy of a research lab demo smuggled onto a conference stage. He explained how he deployed diffusion models directly onto smartphone NPUs, stripped out the usual text-to-embedding path, and instead fed in direct numerical readings from ambient sensors like light data to drive image generation—then proved it by putting his personal phone in airplane mode and running it live. The images were abstract and constrained, but that was the point: local generative interfaces can be fast, stable, and useful without cloud dependencies.

Morph’s Tis argues the future belongs to sub-agents and purpose-built models

Tis, a former Tesla Autopilot engineer, framed the moment as “software 3.5”: not humans prompting models, but agents prompting other agents. His argument was that too many coding systems waste frontier-model compute on repetitive chores like search, compression, and applying diffs, so Morph trains specialized models for those jobs and leaves planning and reasoning to bigger models. He mixed deep inference nerding—speculative decoding, disaggregated prefill, custom kernels—with a product claim that got attention: if you double speed without hurting accuracy, conversion rates roughly double too.

Agentuity says coding agents are eating software itself

Rick Blalock gave the historical sweep talk, walking from Auto-GPT’s chaotic promise through framework-heavy “orchestration theater” to today’s far more grounded coding agents. His strongest point wasn’t technical—it was sociological: non-engineers now understand agent value through tools like OpenClaw and Devin because they’re using coding agents to replace business software, run operations, and build custom workflows themselves. His thesis was memorable and blunt: Marc Andreessen said software ate the world; now coding agents are eating software.

The next frontier: context engineering that can explain itself

Nia Mulin turned the room from vibes to compliance. Starting with a fictional-but-painfully-plausible banking example—Jessica requests a $25,000 credit increase, her employer is on a sanctions list, the agent approves it anyway—she argued that the core failure is not retrieval quantity but missing relationships in context. Using knowledge graphs and citing a telecom study that reportedly moved QA accuracy from 37% to 54% with fine-tuning and then to 91% with graph + RAG, she made the case that “better models don’t fix fractured context,” and that agents need auditable context graphs if they’re going to make real-world decisions.