When All Context Matters: Extended Cache Augmented Generation - Luis Romero-Sevilla, Orbis
TL;DR
Simple RAG fails when all documents are relevant: vector databases retrieve only documents within a similarity threshold, but some scenarios require synthesizing answers across an entire collection.
GraphRAG is too slow for rapidly changing data: recomputing a knowledge graph every time documents get replaced is computationally expensive and time-consuming.
Cache Augmented Generation (CAG) hits context limits: loading all documents into a model's context window degrades answer quality when the window fills up.
Extended CAG distributes documents across parallel buckets: each bucket caches its own KV matrix, and a supervisor model interrogates the right buckets to synthesize answers.
Random distribution beats domain categorization: when documents have dense interconnections, organizing by domain causes supervisors to ignore seemingly irrelevant categories that actually matter.
No retrieval strategy fits all problems: each approach has trade-offs in compute, cost, and speed, so the right solution depends on the specific problem constraints.
The Breakdown
When every document in a collection matters for answering a question and the data turns over rapidly, traditional RAG and GraphRAG both fail. Luis Romero-Sevilla introduces Extended Cache Augmented Generation, a parallel approach that distributes documents across multiple cached context buckets and uses a supervisor model to interrogate them.
Was This Useful?
Share
Keep Reading
Make Alcreon Yours
Tune your feedFive quick questions, and the feed ranks what matters to you first.Or just get notified
The weekly Echo. Signal worth keeping in your inbox.
Every new piece, announced on X.
Read Next
See all
Playbook
The Cheapest Model That Passes
OpenRouter lists 400 models behind one API. The fix for choosing isn't a better leaderboard, it's a four-step protocol that ends in a real eval.

Playbook
Cheap Models, Hard Tasks
Most agent workflows route every step to the frontier model by default. The bill scales with how chatty the agent gets, even when most steps don't need that brain.

Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.