A Reliability Evaluation of Hybrid Deterministic-LLM Based Approaches for Academic Course Registration PDF Information Extraction
arXiv cs.AI / 4/2/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper evaluates reliability of extracting academic course registration (KRS) PDF information using three approaches: LLM-only, hybrid regex+LLM, and a Camelot table-parsing pipeline with LLM fallback.
- Experiments cover 140 LLM-only test documents and 860 documents for the Camelot pipeline across four study programs with varied table and metadata layouts.
- Three 12–14B open models (Gemma 3, Phi 4, Qwen 2.5) were run locally with Ollama on a consumer CPU (no GPU), making the study relevant for computationally constrained environments.
- Using exact match and Levenshtein similarity (threshold 0.7), the Camelot+LLM-fallback pipeline achieved the best accuracy (EM/LS up to ~0.99–1.00) while typically processing PDFs in under 1 second.
- The results indicate that hybrid deterministic + LLM strategies improve efficiency over LLM-only, particularly for deterministic metadata, with Qwen 2.5:14b showing the most consistent performance.
Related Articles

Benchmarking Batch Deep Reinforcement Learning Algorithms
Dev.to

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse
Dev.to

How To Leverage AI for Back-Office Headcount Optimization
Dev.to
Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.
Reddit r/LocalLLaMA
SOTA Language Models Under 14B?
Reddit r/LocalLLaMA