AI EngineerJune 28, 202610m

We Cut 94% of AI Coding Tokens With a Local Code Index - Rajkumar Sakthivel, Tesco

TL;DR

90% of AI coding costs come from input, not output: Files, search results, and context sent to the model account for most expenses, while the AI's generated code represents only 10%.
Their typical query sent 45,000 tokens when only 5,000 mattered: They paid for 40,000 tokens of irrelevant code every single query, like ordering one pizza but paying for ten.
Cutting input by 94% saves 61% total cost: Output compression only saves about 8% total, but input reduction has massive impact because that's where the money goes.
Dual search (meaning + keyword) catches what each misses alone: Meaning-based search finds related ideas but misses exact names; keyword search does the opposite. Together they reduce missed results from 1 in 4 to 1 in 10.
Simple scoring formula beats complex AI models for relevance: A 50/30/20 formula (meaning/keyword/recency) runs in 0.4ms without extra AI calls, faster than asking AI to judge its own results.
Real test on FastAPI: 83K tokens down to 4.9K per question: That's 94% reduction with 90% accuracy finding the right code, tested on 53 files with 20 real developer questions.

The Breakdown

Raj and his friend Foss discovered that 90% of their AI coding costs came from sending irrelevant context, not the AI's output. They built a local search layer that cut tokens by 94% and saved them $186 on a real project by sending only the code that matters.