
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
Hugging Face is turning agents into actual ML operators — Merve Noyan shows agents that can now kick off fine-tunes, launch jobs, explore datasets, build demos, and even choose infra via Hugging Face “skills,” including prompts like “train Qwen 2.5 VL on this dataset for me.”
Open models are no longer the consolation prize — she points to the Artificial Analysis Intelligence Index, says open models have effectively caught up, and names GLM 5.1 as a standout she’s personally using for coding and ranking highly on benchmarks like SWE-bench Pro.
Vision is becoming the default for agentic models — Noyan argues labs are increasingly shipping VLMs on day zero, citing Gemma 4, Qwen 3.5, and Kimi K2.5, because vision-capable models can act like computer-use agents over screenshots and UI flows.
Local open-source agents are now easy enough to feel boring—in a good way — she highlights Pi, llama.cpp’s built-in llama-agent binary, Hermes Agent, GGUF quantization, and Hugging Face’s “Use this model” flow as making local serving and coding agents dramatically less “frictiony.”
Agent traces are becoming training data, not just logs — Hugging Face now has a new dataset repo type called “traces” that can host sessions from tools like Codex, Claude Code, or Pi, parse them in the viewer, and later feed them back into model training.
The most sci-fi moment is agents doing the infra math for you — in her examples, the agent estimates VRAM, asks about validation split, chooses an instance, calculates cost, writes the training or OCR job script, and leaves you with a finished model on the Hub.
Merve opens with a mini manifesto: in ML, openness is not one thing but a spectrum — open weights with non-commercial terms, truly open-source models under MIT or Apache 2.0, and then fully open stacks where the harness and agent code are exposed too. Her practical point is sharp: when cloud performance silently degrades, open systems let you see it, control it, quantize it, fine-tune it, and even deploy it to edge devices or browsers for better privacy.
She pushes back on the old “open models aren’t good enough” narrative and says that’s just outdated now, calling out GLM 5.1 as “absolutely crashing it” and even part of her own coding setup. Hugging Face Hub, now nearing 3 million models, becomes her operating system for all of this — not just model hosting, but the inference layer and discovery surface for the open ecosystem.
One clear trend she sees: agentic models are increasingly vision-first. She splits the field into LLMs and VLMs, then argues VLMs are especially powerful because they can operate like computer-use agents over screenshots, knowing where to click; Gemma 4, Qwen 3.5, and Kimi K2.5 are her examples of labs shipping vision capabilities from day zero.
With millions of models available, she says picking one used to be a mess, so Hugging Face added benchmark datasets directly into the datasets UI. You can now click into SWE-bench Pro, Humanity’s Last Exam, AIME, and others to see ranked open models, then “vibe check” them through Inference Providers, which route requests across vendors like Groq and Cerebras and expose columns like cheapest, fastest, and tool use.
The middle of the talk is a love letter to local agents getting dramatically easier. She calls out Pi as a favorite for simple setup, llama.cpp’s baked-in llama-agent binary for one-command startup from a Hub model ID, and then goes full fangirl on Hermes Agent, saying “I will just die on this hill” because of its memory management and easy integrations with Slack or WhatsApp.
Her most human moment is a small failure story: she initially couldn’t get Hermes wired into Slack, with colleague Niels there to witness it. Then she asked GLM 5.1 to fix the integration from inside the agent setup, and it resolved the issue itself — “it was a good day,” she says, which lands as a very concrete endorsement.
Hugging Face now supports a dataset repo type called traces, where sessions from Codex, Claude Code, or Pi can be uploaded, browsed in a parsed viewer, and eventually reused for training. She also walks through practical serving details — filtering model support by local apps like LM Studio and llama.cpp, checking GGUF compatibility, and seeing things like a 4-bit-quantized larger Gemma 4 fitting on an L4 GPU with 24 GB VRAM.
The final stretch is the headline promise: Hugging Face skills let an agent manage repos, train LLMs and VLMs, build Gradio demos, inspect datasets, and call Spaces through MCP. Her example with Claude Code fine-tuning Qwen 2.5 VL on LLaVA Instruct Mix is what she calls “absolute sci-fi”: the agent asks a few setup questions, computes VRAM and cost, launches the job, and leaves a model on the Hub; then she ends with Niels’s workflow OCR’ing 30,000 papers using open OCR models, jobs, and prompting, with the agent handling the script-writing and instance selection.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.