Turbo-OCR Update: Layout Model + Multilingual

Reddit r/LocalLLaMA / 4/27/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

TurboOCR’s OCR server has been updated with a new layout detection model, adding PP-StructureV3 to better identify document structure.
The system’s multilingual OCR support has expanded beyond Latin scripts to include Chinese, Japanese, Korean, Cyrillic, Arabic, and additional Latin-script languages.
The implementation keeps the existing C++/CUDA-based stack using TensorRT FP16, multi-stream processing, and gRPC/HTTP interfaces with a direct PDF endpoint.
Reported benchmarks on Linux with an RTX 5090 and CUDA 13.2 indicate very high throughput, including 100+ images/s for text-heavy inputs, 1,000+ images/s for sparse/low-text inputs, and 270 p/s on the FUNSD dataset.
The project is published as source code on GitHub (aiptimizer/TurboOCR), making the update directly reusable for high-volume image and PDF OCR workflows.

Follow-up to my post 18 days ago about the C++/CUDA OCR server. Two additions:

What's New:

Layout model: Added PP-StructureV3 for layout detection
Multilingual: No longer Latin-only. Now supports Chinese, Japanese, Korean, Cyrillic, Arabic, and Latin-script languages.

Same stack: C++, TensorRT FP16, multi-stream, gRPC/HTTP, direct pdf endpoint.

Benchmarks (Linux / RTX 5090 / CUDA 13.2):