Back to Podcast Digest
Alex Finn24m

Hermes Agent powered by local models on the DGX Spark is basically magic

TL;DR

  • Alex Finn’s pitch is simple: local agents feel like a “24/7 AI employee” — he sets up Hermes Agent on an Nvidia DGX Spark using a fully local Qwen 3.6 27B model so it runs privately, offline-capable, and without per-token cloud costs.

  • He argues local models win on cost, privacy, and control — instead of paying subscriptions or API fees, you pay for the machine and electricity, keep chats on-device, and can customize models with LoRAs to match your voice or tasks.

  • The DGX Spark is framed as the easiest serious box for this workflow — Finn says you can run it headless, connect over its local network, add Tailscale for remote control, and let Hermes manage installs and setup from your main computer.

  • The actual setup is surprisingly lightweight once Hermes is in the loop — his flow is: install Hermes on your main machine with a cloud model first, prompt it to configure the Spark and Tailscale, then have it download and load Qwen 3.6 27B, which took him about 20 minutes.

  • The demo moment is the emotional hook: a local chat UI answering from the box on his desk — after Hermes builds a front end, he types “Hey, are you there?” and gets “Yes, I’m here,” which he treats as the “holy crap” moment of hosting your own intelligence.

  • He closes with three concrete use cases that scale from simple to ambitious — a 9:00 a.m. daily AI-stock cron report, automatic YouTube transcript-to-newsletter repurposing, and vibe-coding a polished to-do app locally, which he contrasts with cloud coding tools that can cost thousands per month.

The Breakdown

“This is basically magic”: the local AI employee pitch

Finn opens at full hype: he’s got Hermes Agent running on an Nvidia DGX Spark, powered entirely by a local model, and he keeps hammering the core benefits — private, secure, and effectively free beyond electricity. He frames it less as a toy and more as a permanent “24/7 AI employee” sitting on your desk.

Why he thinks local models are the future

Before touching setup, he makes the case for local models: no token fees, no cloud-stored prompts, deep customization with LoRAs, and the educational upside of learning how AI actually works. The tone is half practical, half philosophical — you should be allowed to do this because it’s useful, but also because it’s just plain fun to know “super intelligence” is running on your machine.

Why the DGX Spark specifically made this easy

Finn says he owns plenty of hardware, but likes the DGX Spark because it’s basically plug-in-and-go and gives access to Nvidia’s developer stack for deeper customization. He also makes a point of saying Nvidia didn’t send him this one — he bought it months earlier — which is his way of signaling this isn’t just sponsor copy.

Headless setup, Tailscale, and the “non-technical” rant

The setup flow is practical: plug the Spark in, run it headless without a monitor, connect through the network info in the manual, and use Hermes on your main computer to walk through the rest. His key prompt asks Hermes to configure the Spark and install Tailscale so the device can be controlled from anywhere, and that tees up his most animated line in the video: stop calling yourself “non-technical,” because with AI agents, that label belongs “in the garbage can.”

Loading Qwen 3.6 27B and the sovereignty moment

For the model, he picks Qwen 3.6 27B, calling it the strongest local option right now: fast, efficient, and surprisingly close to frontier models. Hermes finds the right build, downloads it, loads it into memory in roughly 20 minutes, and Finn turns that into a mini victory speech — if this is your first local setup, you’re now part of the tiny fraction of people who’ve put “super intelligence on your desk,” with no one able to cut you off.

The first live test: a chat UI talking back locally

Before wiring it into agent workflows, he has Hermes build a simple front-end chat interface just to prove the local model is alive. The response — “Yes, I’m here. How can I help you today?” — lands like the emotional payoff of the whole tutorial, and Finn is visibly delighted by the fact that the reply came straight from the machine on his desk.

Turning that model into a second Hermes worker

Next he uses Hermes’s multi-agent support to create a second profile connected to the local Qwen model and names it “Quen.” When it first replies “I’m actually Hermes agent, not Quen,” he laughs through the rough edge, fixes the memory by explicitly naming it, and shows Hermes logging the memory update — a small but very human demo of agent setup in the wild.

Three use cases: investing, content repurposing, and vibe coding

He walks through three escalating examples. First: a beginner-friendly 9:00 a.m. scheduled AI-stock report for companies with long-term AI moats, powered by a cron job. Second: an intermediate content workflow where Hermes grabs a YouTube transcript and repurposes it into a newsletter — or, in a more interesting twist, continuously scans new AI videos every hour so the system can “self-improve” without racking up cloud costs. Third: the flashy demo, vibe-coding a to-do list app with priorities, dates, filters, and slick animations, which he says feels like getting unlimited coding help for the cost of electricity instead of paying expensive cloud tool bills.

Share