Back to Podcast Digest
AI Engineer10m

We Cut 94% of AI Coding Tokens With a Local Code Index - Rajkumar Sakthivel, Tesco

TL;DR

  • 90% of AI coding costs come from input, not output: Files, search results, and context sent to the model account for most expenses, while the AI's generated code represents only 10%.

  • Their typical query sent 45,000 tokens when only 5,000 mattered: They paid for 40,000 tokens of irrelevant code every single query, like ordering one pizza but paying for ten.

  • Cutting input by 94% saves 61% total cost: Output compression only saves about 8% total, but input reduction has massive impact because that's where the money goes.

  • Dual search (meaning + keyword) catches what each misses alone: Meaning-based search finds related ideas but misses exact names; keyword search does the opposite. Together they reduce missed results from 1 in 4 to 1 in 10.

  • Simple scoring formula beats complex AI models for relevance: A 50/30/20 formula (meaning/keyword/recency) runs in 0.4ms without extra AI calls, faster than asking AI to judge its own results.

  • Real test on FastAPI: 83K tokens down to 4.9K per question: That's 94% reduction with 90% accuracy finding the right code, tested on 53 files with 20 real developer questions.

The Breakdown

Raj and his friend Foss discovered that 90% of their AI coding costs came from sending irrelevant context, not the AI's output. They built a local search layer that cut tokens by 94% and saved them $186 on a real project by sending only the code that matters.

Was This Useful?

Share