DharmaOCR: Open-Source Specialized SLM (3B) + Cost–Performance Benchmark against LLMs and other open-sourced models [R]

Reddit r/MachineLearning / 4/25/2026

📰 NewsIndustry & Market MovesModels & Research

共有:

Key Points

DharmaOCR has been open-sourced on Hugging Face, with models and datasets made publicly available for free experimentation.
The project fine-tuned open-source SLMs (3B and 7B) using SFT plus DPO, then benchmarked them against GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Document AI, and several OCR open-source alternatives.
The specialized 7B and 3B models achieved top benchmark scores of 0.925 and 0.911 respectively, outperforming the tested baselines.
Using DPO where rejected examples come from the model’s own degenerate outputs reduced the failure rate by 87.6%.
AWQ quantization reduced per-page inference cost by about 22% while having an insignificant impact on performance.

Hey everyone, we just open-sourced DharmaOCR on Hugging Face. Models and datasets are all public, free to use and experiment with.

We also published the paper documenting all the experimentation behind it, for those who want to dig into the methodology.

We fine-tuned open-source SLMs (3B and 7B parameters) using SFT + DPO and ran them against GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Document AI, and open-source alternatives like OlmOCR, Deepseek-OCR, GLMOCR, and Qwen3.

- The specialized models came out on top: 0.925 (7B) and 0.911 (3B).

- DPO using the model's own degenerate outputs as rejected examples cut the failure rate by 87.6%.

- AWQ quantization drops per-page inference cost ~22%, with insignificant effect on performance.

Models & datasets: https://huggingface.co/Dharma-AI

Full paper: https://arxiv.org/abs/2604.14314

Paper summary: https://gist.science/paper/2604.14314

submitted by /u/augusto_camargo3
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/25DailyView insight →

Black Hat USA

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

Dev.to

Built a multi-model AI platform with real-time WebRTC voice, persistent cross-model memory, and a full generation suite - free account gets 1 min voice/month

Reddit r/artificial

Self-Supervised Temporal Pattern Mining for smart agriculture microgrid orchestration under multi-jurisdictional compliance

Dev.to

DharmaOCR: Open-Source Specialized SLM (3B) + Cost–Performance Benchmark against LLMs and other open-sourced models [R]

Key Points

💡 Insights using this article

Related Articles

Black Hat USA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

Built a multi-model AI platform with real-time WebRTC voice, persistent cross-model memory, and a full generation suite - free account gets 1 min voice/month

Self-Supervised Temporal Pattern Mining for smart agriculture microgrid orchestration under multi-jurisdictional compliance

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer