Turbo-OCR Update: Layout Model + Multilingual

Reddit r/LocalLLaMA / 4/27/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • TurboOCR’s OCR server has been updated with a new layout detection model, adding PP-StructureV3 to better identify document structure.
  • The system’s multilingual OCR support has expanded beyond Latin scripts to include Chinese, Japanese, Korean, Cyrillic, Arabic, and additional Latin-script languages.
  • The implementation keeps the existing C++/CUDA-based stack using TensorRT FP16, multi-stream processing, and gRPC/HTTP interfaces with a direct PDF endpoint.
  • Reported benchmarks on Linux with an RTX 5090 and CUDA 13.2 indicate very high throughput, including 100+ images/s for text-heavy inputs, 1,000+ images/s for sparse/low-text inputs, and 270 p/s on the FUNSD dataset.
  • The project is published as source code on GitHub (aiptimizer/TurboOCR), making the update directly reusable for high-volume image and PDF OCR workflows.

Follow-up to my post 18 days ago about the C++/CUDA OCR server. Two additions:

What's New:

  • Layout model: Added PP-StructureV3 for layout detection
  • Multilingual: No longer Latin-only. Now supports Chinese, Japanese, Korean, Cyrillic, Arabic, and Latin-script languages.

Same stack: C++, TensorRT FP16, multi-stream, gRPC/HTTP, direct pdf endpoint.

Benchmarks (Linux / RTX 5090 / CUDA 13.2):

  • Very text-heavy images: 100+ img/s
  • Sparse/Low-text: 1,000+ img/s
  • 270p/s on FUNSD Dataset

Source: github.com/aiptimizer/TurboOCR

submitted by /u/Civil-Image5411
[link] [comments]