OpenAI vs. Anthropic's Direct Faceoff + Future of Agents — With Aaron Levie
TL;DR
Agents are the real prize, not chatbots — Aaron Levie says the shift is from back-and-forth chat to systems that can take a task, use tools, write code, and work for minutes or hours, expanding the market from engineers to every knowledge worker — roughly a 30x to 50x bigger opportunity.
Enterprise adoption will be slower than Silicon Valley thinks — Levie’s core argument is that coding took off because it’s text-based, verifiable, and used by technical people, while most enterprise work is messy, spread across 20 to 100 systems, and full of tribal knowledge agents can’t easily infer.
The bottleneck is not model intelligence but enterprise context — His sharpest line is that “an AI problem is really a data problem”: if a company’s contracts, marketing assets, and research live in five different repositories, the agent won’t know the true source of truth any better than a brilliant new employee dropped in cold.
Labs and app builders can both win — On whether OpenAI/Anthropic or vertical startups capture the value, Levie won’t pick a side; his view is that the labs own the intelligence layer either way, while companies like Harvey or other domain-specific players may still win where regulation, workflow depth, and trust matter.
New models suggest there’s no wall in sight — Referencing OpenAI’s rumored “Spud” and Anthropic’s next large model, Levie says the last four months already produced double-digit gains on Box’s internal knowledge-work evals, and he expects major improvements across coding, tool use, legal, finance, and life sciences.
OpenAI vs. Anthropic is probably more like AWS vs. Azure in 2010 than a winner-take-all race — Levie compares today’s model battle to the early cloud wars, noting AWS was only about $500 million in revenue in 2010, yet the market grew into a couple hundred billion dollars with multiple huge winners.
The Breakdown
The rivalry has converged into a full-stack AI battle
Alex Kantrowitz opens with the big premise: OpenAI and Anthropic are increasingly building the same thing. Levie agrees the collision was inevitable — if you’re packing “super intelligence” into a model, eventually the use cases converge and the labs end up competing head-to-head across coding, enterprise, and consumer work.
From coding assistant to general-purpose knowledge worker
Levie’s key reframing is that the big breakthrough isn’t just that AI got good at coding — it’s that coding skill can now be applied to the rest of knowledge work. His mental model is basically: imagine every lawyer, marketer, or researcher suddenly became an expert computer power user who could write scripts for any task; that’s what agents start to look like.
Why the money shows up in business first
He agrees with Greg Brockman’s “AI like a laptop” framing — one system for personal and work life — but says the ROI shows up first in enterprise because tokens are expensive and businesses can justify them with GDP-linked output. In his view, the real economic value comes from systems that do substantive work, not just answer casual questions.
The fax machine argument — but with a catch
When Alex asks whether people will even want this, Levie dismisses the resistance as a classic fax-machine argument: efficiency usually wins. But he also pushes back on Silicon Valley hype, saying people over-extrapolate from coding, where the environment is unusually ideal: text-only, fully visible context, easy verification, and highly technical users who know how to recover when the model goes off the rails.
Why the rest of knowledge work is much harder than code
This is the heart of the interview. In marketing, legal, healthcare, or design, context lives across dozens of systems, tasks are subjective, users aren’t deeply technical, and outputs often need human review because there’s no instant “did it compile?” equivalent. That’s why Levie thinks rollout will take years — and why he sees massive opportunity for products that bridge today’s enterprise messiness to the agent future.
Taste, editing, and the limits of pure automation
The two use podcast video editing as a running joke and benchmark. Levie thinks agents will absolutely automate a lot of the cutting and generation, but not erase editorial judgment; instead, the human editor becomes the senior reviewer choosing among five AI-generated cuts, compressing what used to be multiple layers of production into one person with a fleet of agents.
Trust is the real adoption wall
Alex’s most visceral objection is personal: he wants an agent in theory, but not one loose in his inbox or texts. Levie basically says that instinct is right — the safer pattern is to treat the agent like a separate coworker with its own inbox, Box account, and tightly scoped permissions, because prompt injection, data exfiltration, and liability in fields like medicine or finance are still wide-open problems.
The enterprise mess: agents join your company like genius new hires with no tribal knowledge
Levie’s best analogy lands here. An agent inside a 10,000-person company is like a genius PhD who joined one minute ago: incredibly smart, but clueless about where the real contract lives, which repository actually matters, or which system is the source of truth. That’s why he keeps returning to the same point: enterprise AI is downstream of data organization, governance, and process clarity.
Who captures the value — labs or vertical apps?
On the build-vs-wrap debate, Levie gives a very non-clickbait answer: the jury’s still out. He thinks horizontal labs win in every scenario because they supply the intelligence layer, but vertical products can still build huge businesses where customers want a purpose-built solution for legal, healthcare, M&A, or other regulated, high-context workflows.
Bigger models are coming, and Levie says the "wall" narrative is dead
Asked about OpenAI’s rumored Spud and Anthropic’s upcoming larger model, Levie says the important point is that capability progress is still accelerating. He says Box’s own hard knowledge-work evals have already seen double-digit gains in just the last model-family update, which he reads as evidence that new enterprise categories will keep unlocking.
Why he won’t pick OpenAI or Anthropic — and why that may be the point
Levie dodges the “who wins?” question, but with a useful analogy: trying to call the AI race now is like predicting the cloud wars in 2008 or 2010. AWS was only about $500 million in revenue in 2010, Azure had just launched, Google was still basically “Google App Engine,” and yet 15 years later the market supported multiple giant providers — which is exactly why he thinks today’s AI skirmishes may matter less than the sheer size of the market that’s coming.