HalalBench: A Multilingual OCR Benchmark for Food Packaging Ingredient Extraction
arXiv cs.CV / 4/28/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- HalalBench is a new open multilingual OCR benchmark focused specifically on food packaging ingredient label extraction, addressing the lack of standardized evaluation for this use case.
- The benchmark includes 1,043 images (50 real and 993 synthetic) with 36,438 COCO-format annotations across 14 languages, reflecting real-world challenges like curved packaging surfaces and dense multilingual text.
- Four OCR engines were evaluated (docTR, ML Kit, EasyOCR, and others), with overall F1 scores around 0.167–0.193 and complete failure on Japanese (F1=0.000).
- A post-processing clustering ablation improved F1 by 36%, and results are validated with HalalLens, a production halal scanner deployed across 20+ countries.
- The dataset and code are released under open licenses to enable further research and benchmarking in food packaging OCR.
- Categories: []
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
How I Automate My Dev Workflow with Claude Code Hooks
Dev.to

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™
Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System
Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)
Dev.to