Benchmarking semantic code retrieval on Claude Code — Kuba Rogut, Turbopuffer
TL;DR
Cursor's production results are strong: Kuba cites Cursor's blog showing a 24% relative improvement in answer accuracy for Composer, plus a 2.6% increase in code retention and a 2.2% drop in dissatisfied requests on large codebases.
Embeddings act like cached compute: Instead of re-grepping the same codebase every session, semantic indexing pays an upfront chunk-embed-index cost so multiple agents can query stored meaning later with fewer tokens and less repeated work.
Turbo Grep raised precision sharply on Claude Code: In Kuba's 50-task ContextBench-style evaluation, file precision improved from about 65% baseline to 87% with windowed grep plus semantic search, meaning wasted file reads fell from roughly 1 in 3 to 1 in 8.
Semantic search did not automatically improve recall: Raw Claude Code still led on file recall because it aggressively explores many files, while semantic search and windowed grep ended up with similar recall despite better targeting.
The tool choice depends on the task shape: Semantic search won when files were behaviorally related but did not share obvious keywords, such as logic spread across multiple ORMs and libraries, while grep won when the task was basically tracing imports from an early keyword hit.
Inline comments make semantic retrieval better: Kuba says messy code is harder, and repositories with strong inline documentation performed noticeably better because the embedding model could infer the chunk's meaning more accurately.
The Breakdown
Cursor saw a 24% relative accuracy lift and measurable user gains from semantic code search, but when Kuba Rogut bolted a similar approach onto Claude Code, the biggest win was precision, not recall. His benchmark shows semantic search is great at finding behaviorally related files that grep misses, while plain grep still wins on straightforward import tracing.
Was This Useful?
Share
Keep Reading
Make Alcreon Yours
Tune your feedFive quick questions, and the feed ranks what matters to you first.Or just get notified
The weekly Echo. Signal worth keeping in your inbox.
Every new piece, announced on X.
Read Next
See all
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
The Art of Tasteful Prompting
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
The Codex /goal Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.