Back to Podcast Digest
Joe Reis52m

The Hidden Costs of AI Agents & Cloud Data with Sanjay Agrawal (Revefi, co-founder ThoughtSpot, MS)

TL;DR

  • Cloud data costs are usually a systems-and-culture problem, not just a bad-query problem — Sanjay Agrawal says small data teams inherited consumption pricing, endless demand, and “Gmail habits” where nobody ever comes back to turn things off, so unused pipelines and bloated workloads quietly pile up.

  • Sanjay’s thesis comes from building at extreme scale at ThoughtSpot, Microsoft Research, and Google — he cites a ThoughtSpot engine designed for 100 nanoseconds per row and a Fortune 5 query hitting 25 billion rows, 10–12 joins, and 6,000 cores in a few seconds as proof that performance, automation, and operational discipline have to be engineered together.

  • The hidden cost of cloud warehouses lives across multiple layers — in BigQuery it might be slot management, in Databricks outdated libraries plus cluster configuration plus region choice, and in Snowflake t-shirt-size warehouses and multiclustering; teams often just “hit the accelerator” instead of asking what the minimum resource is that still gets the job done.

  • AI agents are already detonating data budgets because they don’t behave like humans — one customer’s natural-language system on BigQuery drove seven figures of extra spend, and another single agentic use case burned roughly 10% of a budget in days because agent-generated query volume and experimentation blow past the assumptions behind human-oriented warehouse pricing.

  • Full autonomy is realistic for resource tuning, not for business-logic changes — Agrawal says self-driving optimization works for jobs, slots, and warehouse sizing, but the moment you change query semantics or filters, it stops being a platform problem and becomes a consulting problem that still needs a human in the loop.

  • The next five years likely mean more open formats, more engine routing, and more pressure to prove ROI — Agrawal expects rising data volume, lower marginal value per extra byte, stronger adoption of open standards like Iceberg-style architectures, and optimization decisions that may involve moving workloads between Snowflake, Databricks, ClickHouse, or other engines instead of just tuning inside one platform.

The Breakdown

Sanjay’s backstory: from database research to Revifi

Sanjay Agrawal opens by framing Revifi’s mission simply: help data teams deliver the right data at the right time and at the right spend. That comes from a career arc through Microsoft Research, Google, and co-founding ThoughtSpot, where he learned both the deep database mechanics and the very human reality that data teams are small, overloaded, and constantly firefighting.

The ThoughtSpot war stories that shaped his worldview

He gets delightfully nerdy here: at ThoughtSpot, his team built what he calls perhaps the world’s fastest in-memory parallel data warehouse engine, targeting 100 nanoseconds per row. His proof point is a Fortune 5 deployment where one ad hoc query touched 25 billion rows, ran 10–12 joins across 6,000 cores, and came back in seconds — all while staying fully ACID-compliant. The real lesson wasn’t just speed; it was that systems at that scale have to self-manage because no team can survive on endless Thanksgiving debugging sessions.

Why cloud data warehouses created a “perfect storm” for data teams

Agrawal says the shift to Snowflake, BigQuery, Databricks, and consumption pricing changed everything while team sizes stayed basically the same. Demand for data exploded, the shelf life of data got shorter, and teams kept accumulating pipelines and workloads because, as he jokes, “Gmail has spoiled us” — nobody ever comes back and says, please reduce my compute. That combination made surprise spend, stale data, and operational chaos more of a norm than an exception.

The old habits that blow up in the cloud

One of the liveliest moments is when Joe brings up teams lifting old on-prem database habits into cloud platforms and torching budgets. Agrawal says he sees this constantly: things like row-level updates, merges, and upserts that made sense in OLTP systems can become much more expensive in architectures like Snowflake, where an update may rewrite an entire micro-partition. The pattern is familiar: teams do a lift-and-shift, test on a small benchmark, everything looks fine, then production hits and “all of this hits the fan really quite violently.”

Where surprise costs actually come from

Instead of blaming vendors, Agrawal breaks cost down into layers: logic, configuration, and resource choices. In Databricks, inefficiency might live in the source code, stale libraries, cluster settings, or running in the wrong region; in Snowflake, teams abuse warehouse sizes because going from small to XL is easier than fixing the query. His favorite analogy: a self-driving car can’t just accelerate — it also has to know when to brake — and cost optimization works the same way because saving pennies while creating hours of operational pain is just “pennywise, pound foolish.”

Revifi’s pitch: the self-driving layer for data operations

Agrawal says Revifi’s design principle was to act like a distinguished engineer embedded in your data team, not just another siloed cost tool. He shares a striking example: one Fortune 500 company connected 71 warehouses, flipped on auto-management, and within days saw spend drop by about 50%. He says the first customer reaction is usually skepticism — “is it real?” — but the visibility tends to win teams over because it surfaces issues early and gives them something concrete to take back to stakeholders.

AI agents are making the cost problem much worse

The conversation turns hard toward AI: Agrawal says that just a few months ago, AI spend wasn’t a common concern, but now it’s becoming unavoidable. He gives two examples — a natural-language system over BigQuery that drove seven figures of additional spend, and another single agentic use case that consumed about 10% of a budget in a few days. His point is sharp: all the warehouse optimizations built for human dashboards go out the window when agents can fire off 10x–20x more exploratory queries and a missed join or filter suddenly scans billions or trillions of rows.

What gets automated, what still needs humans, and where data goes next

Agrawal is bullish but not naive: yes, he thinks self-driving optimization is real for jobs, slot tuning, and warehouse sizing, but no, he doesn’t think humans disappear when business logic changes. The moment a query rewrite changes semantics, he says, it stops being a platform problem and becomes a consulting problem. Looking ahead five years, he expects more open table formats, more specialized engines, more cross-platform routing, and a growing need to prove that more data actually creates more top-line or bottom-line value rather than just more spend.

Share