AI Navigate

I fine-tuned Qwen3.5-2B for OCR

Reddit r/LocalLLaMA / 3/12/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The author has fine-tuned the Qwen3.5-2B vision-language model specifically for English left-to-right document OCR tasks.
  • The fine-tuned model is publicly available on Hugging Face under the repository 'loay/English-Document-OCR-Qwen3.5-2B'.
  • The author is seeking user feedback on the model's performance, particularly on challenging or messy documents and edge cases.
  • This release aims to improve OCR capabilities by leveraging a large multimodal language model fine-tuned for document text extraction.

Hey everyone,

I’ve been working on fine-tuning vision-language models for OCR tasks and wanted to share my latest release. It's a fine-tuned Qwen3.5-2B specifically optimized for English/LTR Document OCR.

Model link: loay/English-Document-OCR-Qwen3.5-2B

I’d love to hear your feedback, especially if you test it out on messy documents or specific edge cases. Let me know how it performs for you!

submitted by /u/Other-Confusion2974
[link] [comments]