AI EngineerMay 25, 202618m

Does GenAI "belong" to data scientists? — Phil Hetzel, Braintrust

TL;DR

Agents are not just another predictive model — Hetzel argues that OpenAI, Anthropic, and Mistral already did the core model-training work, so building agents is less about classic ML pipelines and more about shaping prompts, context, systems, and feedback loops.
Traditional enterprises often assign GenAI to the wrong team by default — He sees CEOs and CIOs push “we need agents,” then delegate the work to existing ML or data science teams simply because generative AI sounds adjacent to their remit.
Context engineering changes who can contribute — Unlike traditional ML, where value often comes from feature engineering and retraining, agent behavior can often be improved by changing prompts and context, which opens the door to product managers and domain experts.
Data scientists still matter most around rigor and guardrails — Hetzel says ML-minded teams bring a healthy skepticism about how LLMs work, stronger testing discipline, and the ability to evaluate LLM-as-judge systems with labeled datasets and metrics like precision, recall, and F1.
The biggest failure mode is optimizing the wrong metrics — Teams trained on traditional ML can overfocus on precision, recall, and F1, while agent evaluation needs a broader view of functional performance across a much wider surface area.
Great agent teams are intentionally cross-functional — His ideal setup combines product, application, and systems engineers with non-technical subject matter experts doing prompt design and human annotation, plus data scientists building eval and observability pipelines.

The Breakdown

“The model’s already built” is Phil Hetzel’s blunt case for why agentic AI shouldn’t be handed automatically to data scientists just because it has “AI” in the name. His answer lands in the middle: the best agent teams mix data scientists, product and systems engineers, and domain experts who actually understand the problem.