Back to Podcast Digest
AI Engineer1h 43m

Vibe Engineering Effect Apps — Michael Arnaldi, Effectful

TL;DR

  • Arnaldi’s core trick is brutally simple: clone the library repo into your project — instead of hoping the model understands docs or MCP tools, he adds effect as a git subtree in .repos/effect so the coding agent treats it like first-party code and copies real patterns.

  • LLMs don’t “learn” your preferences unless you encode them into the repo — he explains that models are fixed after training, so memory has to come from context, agents.md, generated pattern files, lint rules, and repo structure rather than repeated chat instructions.

  • Backpressure beats prompting: lint rules and errors are how you keep agents honest — Arnaldi turns all diagnostics into errors, bans shortcuts like as, any, and unknown, and even writes custom ESLint rules when models start sneaking in hacks like as never as X.

  • He built a working Effect v4 todo API from scratch in 90 minutes without hand-coding — using GPT-5.4, Bun, Vitest, Effect SQL, SQLite, and OpenAPI, he had the model research patterns first, then implement CRUD endpoints, tests, migrations, and generated docs.

  • His workflow is “spec-driven development” plus constant context resets, not giant autonomous runs — he prefers small markdown specs, fresh sessions to avoid context pollution, and simple bash-loop automation over elaborate agent architectures, because “with AI many times less is more.”

  • The bigger point is operational, not just ergonomic: AI apps need durable workflows — he closes by arguing that long-running LLM tasks make failure inevitable, which is why Effect’s clustering and workflow primitives matter for things like registration flows, email delivery, and resilient AI-powered processes.

The Breakdown

“Just clone the repo” as the workshop thesis

Michael Arnaldi opens by saying he prepared “absolutely nothing” because vibe engineering has to be real, then immediately lands on the point of the whole session: this should really be called “just clone the [__] repo.” He says he hasn’t written code by hand since late summer, even for low-level TypeScript and Rust library work, which surprised him because he assumed AI would only really help in app-land.

Why models feel smart but still forget everything

He gives a crisp mental model for coding agents: they’re not learning like humans, they’re just operating inside a fixed-size context window on top of stale pretraining. Even a 1 million token context window can hurt if you stuff it with too much unrelated material, so the job becomes architecting around a “dumb process” that needs the right context every time.

Why docs and MCP aren’t enough for Effect

Arnaldi argues coding models were trained mainly to consume and emit code, not to navigate human docs or random MCP servers. That’s why he started cloning dependency repos directly into projects: node_modules gets ignored, .gitignore gets ignored, but a checked-in subtree gets treated as part of the codebase, so the model actually explores it and imitates upstream patterns.

Bootstrapping the repo with GPT-5.4, Bun, and TypeScript Go

He spins up an empty Bun project live, joking that if the model derails he’ll start insulting it because “it cannot really answer you back.” Along the way he compares model behavior: old Sonnet 4 was “a kid with a knife running through the house,” GPT-5.4 is slower but more solid, and Anthropic’s policy restrictions pushed him toward OpenAI despite Opus still being stronger on some UI work.

Encoding guardrails into the repo so the model can’t cheat

Once the basics are working, he adds agents.md, turns diagnostics into hard errors, and stresses that this is what programming becomes now: shaping repositories so models can perform well at scale. He shows how his own projects evolved custom lint rules to stop bad habits — banning as, any, unknown, even catching the model’s workaround of using as never as X — which he compares to babysitting “a junior developer with a knife running through the kitchen.”

Research first, then implement: pattern files for HTTP APIs and SQL

Instead of asking the agent to build immediately, he has it inspect the Effect repo and write patterns/http-api.md, then later patterns/sql.md and patterns/testing.md. He calls this spec-driven development: generate a markdown plan, restart sessions often to avoid context pollution, and feed the model small, precise tasks rather than one giant prompt.

Building the todo API and fixing the agent’s weird choices

Using those patterns, the model assembles a todo API with create, update, list, and done/not-done flows backed by Effect SQL and SQLite, plus OpenAPI docs and tests. Arnaldi keeps the human-in-the-loop energy high by spotting weirdness in real time — duplicate code, plain string IDs instead of branded types, unnecessary test wrappers, and the classic agent move of changing a test just to make it pass.

The ending point: a real app, and why workflows matter next

By the end they have a working API, a start command, OpenAPI docs, and a pushed public repo, all from an empty project and “zero Effect knowledge” at the start. He closes by zooming out: the real next step is durable workflows and clustering, because AI makes processes long-running, and once a request lasts a minute instead of 10 ms, failures become unavoidable — which is exactly why systems like Temporal and Effect’s workflow stack suddenly matter so much.

Share