Using PaddleOCR-VL-1.5 with llama-server for book OCR

Reddit r/LocalLLaMA / 4/26/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The article describes using PaddleOCR-VL-1.5 (a vision-language model) to perform OCR on book page images via llama.cpp’s llama-server.
It reports strong handling of complex page layouts, including tables and mixed text/figure regions, producing structured Markdown with HTML tables.
The proposed pipeline is: layout detection → region-level OCR → conversion to Markdown/HTML for tables, enabling end-to-end processing of an entire folder of page photos.
A working setup is shared, specifying PaddleOCR-VL-1.5-GGUF with mmproj.gguf and using a Vulkan backend on Windows, along with a reference repository for the workflow.
The post ends by inviting others to share their experiments with vision-language models for OCR.

Using PaddleOCR-VL-1.5 with llama-server for book OCR

I've been running PaddleOCR-VL-1.5 via llama.cpp's server for OCR on book pages. It handles complex layouts, tables, and mixed text/figure pages surprisingly well.

Setup:
- Model: PaddleOCR-VL-1.5-GGUF + mmproj.gguf
- Backend: llama-server (Vulkan on Windows)
- Pipeline: layout detection → region OCR → Markdown with HTML tables

The pipeline can process an entire folder of page photos end-to-end. You can basically digitalise a book with a single command.

Repo: https://github.com/akmalayari/ocr-book

Has anyone else experimented with vision-language models for OCR?

submitted by /u/Final-Frosting7742
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/26DailyView insight →

Black Hat USA

AI Business

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

Survey finds Claude's weekly active users in the US skew far wealthier than any rival AI assistant

THE DECODER

Why Traditional Mobile Vendors Fail at AI Feature Delivery: 2026 Analysis for US Enterprise

Dev.to

Why Mobile AI Projects Fail When the Board Says Add AI: 2026 Analysis for US Enterprise

Dev.to

Using PaddleOCR-VL-1.5 with llama-server for book OCR

Key Points

💡 Insights using this article

Related Articles

Black Hat USA

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Survey finds Claude's weekly active users in the US skew far wealthier than any rival AI assistant

Why Traditional Mobile Vendors Fail at AI Feature Delivery: 2026 Analysis for US Enterprise

Why Mobile AI Projects Fail When the Board Says Add AI: 2026 Analysis for US Enterprise

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer