AI EngineerMay 13, 202619m

Your Agent Can Now Train Models — Merve Noyan, Hugging Face

TL;DR

Hugging Face is turning agents into actual ML operators — Merve Noyan shows agents that can now kick off fine-tunes, launch jobs, explore datasets, build demos, and even choose infra via Hugging Face “skills,” including prompts like “train Qwen 2.5 VL on this dataset for me.”
Open models are no longer the consolation prize — she points to the Artificial Analysis Intelligence Index, says open models have effectively caught up, and names GLM 5.1 as a standout she’s personally using for coding and ranking highly on benchmarks like SWE-bench Pro.
Vision is becoming the default for agentic models — Noyan argues labs are increasingly shipping VLMs on day zero, citing Gemma 4, Qwen 3.5, and Kimi K2.5, because vision-capable models can act like computer-use agents over screenshots and UI flows.
Local open-source agents are now easy enough to feel boring—in a good way — she highlights Pi, llama.cpp’s built-in llama-agent binary, Hermes Agent, GGUF quantization, and Hugging Face’s “Use this model” flow as making local serving and coding agents dramatically less “frictiony.”
Agent traces are becoming training data, not just logs — Hugging Face now has a new dataset repo type called “traces” that can host sessions from tools like Codex, Claude Code, or Pi, parse them in the viewer, and later feed them back into model training.
The most sci-fi moment is agents doing the infra math for you — in her examples, the agent estimates VRAM, asks about validation split, chooses an instance, calculates cost, writes the training or OCR job script, and leaves you with a finished model on the Hub.

Summary

Why open source matters more now

Merve opens with a mini manifesto: in ML, openness is not one thing but a spectrum — open weights with non-commercial terms, truly open-source models under MIT or Apache 2.0, and then fully open stacks where the harness and agent code are exposed too. Her practical point is sharp: when cloud performance silently degrades, open systems let you see it, control it, quantize it, fine-tune it, and even deploy it to edge devices or browsers for better privacy.

Open models have caught up, and the Hub is the center of gravity

She pushes back on the old “open models aren’t good enough” narrative and says that’s just outdated now, calling out GLM 5.1 as “absolutely crashing it” and even part of her own coding setup. Hugging Face Hub, now nearing 3 million models, becomes her operating system for all of this — not just model hosting, but the inference layer and discovery surface for the open ecosystem.

The rise of agentic VLMs

One clear trend she sees: agentic models are increasingly vision-first. She splits the field into LLMs and VLMs, then argues VLMs are especially powerful because they can operate like computer-use agents over screenshots, knowing where to click; Gemma 4, Qwen 3.5, and Kimi K2.5 are her examples of labs shipping vision capabilities from day zero.

Better ways to choose and compare models

With millions of models available, she says picking one used to be a mess, so Hugging Face added benchmark datasets directly into the datasets UI. You can now click into SWE-bench Pro, Humanity’s Last Exam, AIME, and others to see ranked open models, then “vibe check” them through Inference Providers, which route requests across vendors like Groq and Cerebras and expose columns like cheapest, fastest, and tool use.

Local coding agents: Pi, llama.cpp, and Hermes

The middle of the talk is a love letter to local agents getting dramatically easier. She calls out Pi as a favorite for simple setup, llama.cpp’s baked-in llama-agent binary for one-command startup from a Hub model ID, and then goes full fangirl on Hermes Agent, saying “I will just die on this hill” because of its memory management and easy integrations with Slack or WhatsApp.

The GLM 5.1 anecdote that sold her

Her most human moment is a small failure story: she initially couldn’t get Hermes wired into Slack, with colleague Niels there to witness it. Then she asked GLM 5.1 to fix the integration from inside the agent setup, and it resolved the issue itself — “it was a good day,” she says, which lands as a very concrete endorsement.

Traces, quantization, and making local serving sane

Hugging Face now supports a dataset repo type called traces, where sessions from Codex, Claude Code, or Pi can be uploaded, browsed in a parsed viewer, and eventually reused for training. She also walks through practical serving details — filtering model support by local apps like LM Studio and llama.cpp, checking GGUF compatibility, and seeing things like a 4-bit-quantized larger Gemma 4 fitting on an L4 GPU with 24 GB VRAM.

Your agent can now train models — and even OCR 30,000 papers

The final stretch is the headline promise: Hugging Face skills let an agent manage repos, train LLMs and VLMs, build Gradio demos, inspect datasets, and call Spaces through MCP. Her example with Claude Code fine-tuning Qwen 2.5 VL on LLaVA Instruct Mix is what she calls “absolute sci-fi”: the agent asks a few setup questions, computes VRAM and cost, launches the job, and leaves a model on the Hub; then she ends with Niels’s workflow OCR’ing 30,000 papers using open OCR models, jobs, and prompting, with the agent handling the script-writing and instance selection.

Was This Useful?

LinkedIn X Email

Keep Reading

Tune your feedFive quick questions, and the feed ranks what matters to you first.

Or just get notified

The weekly Echo. Signal worth keeping in your inbox.

Every new piece, announced on X.

Follow @alcreon on X

Your Agent Can Now Train Models — Merve Noyan, Hugging Face

Summary

Why open source matters more now

Open models have caught up, and the Hub is the center of gravity

The rise of agentic VLMs

Better ways to choose and compare models

Local coding agents: Pi, llama.cpp, and Hermes

The GLM 5.1 anecdote that sold her

Traces, quantization, and making local serving sane

Your agent can now train models — and even OCR 30,000 papers

Was This Useful?

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks

Summary

Why open source matters more now

Open models have caught up, and the Hub is the center of gravity

The rise of agentic VLMs

Better ways to choose and compare models

Local coding agents: Pi, llama.cpp, and Hermes

The GLM 5.1 anecdote that sold her

Traces, quantization, and making local serving sane

Your agent can now train models — and even OCR 30,000 papers

Was This Useful?

Make Alcreon Yours

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks