Turbo-OCR for high-volume image and PDF processing

Reddit r/LocalLLaMA / 4/9/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • The article describes a performance bottleneck when OCR-ing ~940,000 PDFs, noting that common options like PaddleOCR were limited to about ~15 images per second despite strong GPU hardware.
  • It explains that GPU underutilization and Python/single-stream bottlenecks in PaddleOCR prevented efficient throughput, while VLM-based OCR approaches via vLLM achieved only ~2 images per second at high cost.
  • The proposed solution is a custom C++/CUDA inference server wrapping PP-OCRv5-mobile using TensorRT FP16, with multi-stream concurrency and gRPC/HTTP serving to maximize GPU utilization.
  • Reported benchmarks show a large throughput improvement, reaching 100+ img/s on text-heavy pages and 1,000+ img/s on sparse/low-text pages with GPU utilization rising from ~15% to ~99%.
  • The trade-off is reduced layout/table fidelity (no complex reading order or table extraction), with the author recommending VLM OCR alternatives like GLM-OCR or Paddle-VL when layout accuracy is required.

I recently had to process ~940,000 PDFs. I started with the standard OCR tools, but the bottlenecking was frustrating. Even on an RTX 5090, I was seeing low speed.

The Problem:

  • PaddleOCR (the most popular open source OCR): Maxed out at ~15 img/s. GPU utilization hovered around 15%. Their high performance inference mode doesn't support Blackwell GPUs yet (needs CUDA < 12.8) and doesn't work with the latin recognition model either.
  • Any VLM OCR (via vLLM): Great accuracy, but crawled at max 2 img/s. At a million pages, the time/cost was prohibitive.

The Solution: A C++/CUDA Inference Server

PaddleOCR bottlenecks on Python overhead and single-stream execution, so the GPU was barely being used. The fix was a C++ server around the PP-OCRv5-mobile models with TensorRT FP16 and multi-stream concurrency, served via gRPC/HTTP. Went from 15% to 99% GPU utilisation and multiplied the throughput compared to using PaddleOCR's own library. Claude Code and Gemini CLI did most of the coding.Benchmarks (Linux/ RTX 5090 / CUDA 13.1)

  • Text-heavy pages: 100+ img/s
  • Sparse/Low-text pages: 1,000+ img/s

Trade-offs

  1. Accuracy vs. Speed: This trades layout accuracy for raw speed. No multi-column reading order or complex table extraction. If you need that, GLM-OCR or Paddle-VL or other VLM based OCRs are better options.

Source for those interested: github.com/aiptimizer/turbo-ocr

submitted by /u/Civil-Image5411
[link] [comments]