AI Navigate

Kreuzberg v4.5.0: We loved Docling's model so much that we gave it a faster engine

Reddit r/LocalLLaMA / 3/22/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • Kreuzberg v4.5.0 releases with a major upgrade: it now understands document structure and layout in addition to text by integrating Docling's RT-DETR v2 model into a Rust-native pipeline, enabling faster and more accurate document processing.
  • The integration uses Docling's layout model to classify 17 document element types and reconstruct Markdown tables with SLANet-Plus, matching internal structure to native PDF text positions.
  • Benchmark results show Kreuzberg is about 2.8x faster on average (1,032 ms/doc vs 2,894 ms/doc) with lower memory overhead and no Python dependency, across 171 PDFs including academic, government, and legal documents.
  • Kreuzberg is open-source (MIT), with bindings for Python, TypeScript/Node.js, and other languages, leveraging pdfium for text extraction, ONNX Runtime for inference, and Rust parallelism via Rayon.

Hi folks,

We just released Kreuzberg v4.5.0, and it's a big one.

Kreuzberg is an open-source (MIT) document intelligence framework supporting 12 programming languages. Written in Rust, with native bindings for Python, TypeScript/Node.js, PHP, Ruby, Java, C#, Go, Elixir, R, C, and WASM. It extracts text, structure, and metadata from 88+ formats, runs OCR, generates embeddings, and is built for AI pipelines and document processing at scale.

What's new in v4.5.0

A lot! For the full release notes, please visit our changelog.

The core is this: Kreuzberg now understands document structure (layout/tables), not just text. You’ll see that we used Docling’s model to do it.

Docling is a great project, and their layout model, RT-DETR v2 (Docling Heron), is excellent. It's also fully open source under a permissive Apache license. We integrated it directly into Kreuzberg, and we want to be upfront about that.

What we've done is embed it into a Rust-native pipeline. The result is document layout extraction that matches Docling's quality and, in some cases, outperforms it. It’s 2.8x times faster on average, with a fraction of the memory overhead, and without Python as a dependency. If you're already using Docling and happy with the quality, give Kreuzberg a try.

We benchmarked against Docling on 171 PDF documents spanning academic papers, government and legal docs, invoices, OCR scans, and edge cases:

Structure F1: Kreuzberg 42.1% vs Docling 41.7%

Text F1: Kreuzberg 88.9% vs Docling 86.7%

Average processing time: Kreuzberg 1,032 ms/doc vs Docling 2,894 ms/doc

The speed difference comes from Rust's native memory management, pdfium text extraction at the character level, ONNX Runtime inference, and Rayon parallelism across pages.

RT-DETR v2 (Docling Heron) classifies 17 document element types across all 12 language bindings. For pages containing tables, Kreuzberg crops each detected table region from the page image and runs SLANet-Plus, which is a specialized model that predicts the internal structure of tables (rows, columns, and cells). The predicted cell grid is then matched against native PDF text positions to reconstruct accurate markdown tables.

Kreuzberg extracts text directly from the PDF's native text layer using pdfium, preserving exact character positions, font metadata (bold, italic, size), and unicode encoding. Layout detection then classifies and organizes this high-fidelity text according to the document's visual structure. For pages without a native text layer (scanned documents, image-only PDFs), Kreuzberg automatically detects the absence of text and falls back to Tesseract OCR.

When a PDF contains a tagged structure tree (common in PDF/A and accessibility-compliant documents), Kreuzberg leverages the author's original paragraph boundaries and heading hierarchy, then applies layout model predictions as classification overrides, getting the best of both worlds.

Broken font CMap spacing ("co mputer" → "computer") is fixed. There's also a new multi-backend OCR pipeline with quality-based fallback, PaddleOCR v2 with a unified 18,000+ character multilingual model, and a breaking cleanup of the batch API.

If you're running Docling in production, benchmark Kreuzberg against it and let us know what you think! Try it out GitHub :)

submitted by /u/Eastern-Surround7763
[link] [comments]