Structuring the Unstructured - Cedric Clyburn, Red Hat
TL;DR
50x cost savings: A Hugging Face engineer processed Common Crawl PDFs using Docling at 50 times lower cost than VLMs and naive OCR, running entirely on CPU without needing GPUs.
Tables and images preserved: Unlike basic PDF parsers that output garbled linear text, Docling extracts tables as dataframes and annotates images using local vision models like Granite.
Chunkless RAG pattern: Docling enables retrieval without vector databases by using document outlines as the index, letting LLMs find relevant sections directly from markdown structure.
Local and open source: Part of the Linux Foundation, Docling runs entirely on your machine with no data leaving your environment, critical for air-gapped enterprise deployments.
Scales via microservices: Docling Serve deploys as a REST API for processing thousands of documents, while an MCP server lets AI agents like Claude Code handle document conversion autonomously.
The Breakdown
Cedric Clyburn from Red Hat demonstrates Docling, an open-source document processing tool that converts PDFs and other unstructured files into LLM-ready formats at 50x lower cost than vision language models, all running locally without GPUs. He shows live demos of table extraction, image annotation, chunkless RAG patterns, and deployment options via REST API and MCP servers.
Was This Useful?
Share
Keep Reading
Make Alcreon Yours
Tune your feedFive quick questions, and the feed ranks what matters to you first.Or just get notified
The weekly Echo. Signal worth keeping in your inbox.
Every new piece, announced on X.
Read Next
See all
Playbook
The Cheapest Model That Passes
OpenRouter lists 400 models behind one API. The fix for choosing isn't a better leaderboard, it's a four-step protocol that ends in a real eval.

Playbook
Cheap Models, Hard Tasks
Most agent workflows route every step to the frontier model by default. The bill scales with how chatty the agent gets, even when most steps don't need that brain.

Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.