Theo - t3.ggJune 30, 202631m

Why is OpenAI so much more efficient?

TL;DR

Token efficiency beats per-token pricing: GPT-5.5 medium used half the tokens of GPT-4o X high while scoring higher, making it cheaper overall despite doubled per-token prices.
Reasoning tokens are the hidden cost: Models generate thousands of tokens "talking to themselves" before producing answers, and these reasoning tokens become input tokens on every subsequent step, creating exponential cost growth.
OpenAI uses "grug brain" reasoning: Leaked traces show OpenAI models reason in fragments like "need agent kind maybe open hands direct okay" instead of full sentences, slashing token counts dramatically.
Competitors can't see the secret sauce: Frontier labs only show summarized reasoning traces, not raw ones, preventing competitors from learning how OpenAI achieves such efficiency.
Claude's verbosity explains its 1M context window: Claude's plain-English reasoning traces are so long that Anthropic needs massive context windows just to fit them, unlike OpenAI's compressed approach.
Different reasoning styles emerge: GLM-5.2 switched from verbose "wait, actually, let me reconsider" reasoning to a more efficient format, cutting tokens by two-thirds for the same task.

The Breakdown

OpenAI's GPT-5.5 models achieve similar intelligence to competitors using a fraction of the tokens, with leaked reasoning traces revealing a bizarre "grug brain" shorthand where the model thinks in fragments like "try period" instead of full sentences. The efficiency gap is massive: GPT-5.5 medium scored higher on Deep SWE benchmarks with 20K tokens than Gemini managed with 270K tokens, a 12-14x difference that translates directly to cost savings despite higher per-token pricing.