| Hi everyone, I am working on building a proof of concept for OCR system that can recognize both handwritten and printed Hindi (Devanagari) text in complex documents. I’m trying to build on top of TrOCR ( The core problem I’m running into is on the decoder/tokenizer side — TrOCR’s default decoder and tokenizer are trained for English only, and I need Hindi output. What I’ve tried so far: I replaced TrOCR’s decoder with However, the model failed to overfit even on a single data point. The loss comes down but hovers at near 2-3 at the end, and the characters keep repeating instead of forming a meaningful word or the sentence. I have tried changing learning rate, introducing repetition penalty but overfitting just don’t happen. I need guidance as is their any other tokenizer out there that can work well with TrOCR’s encoder or can you help me improve in this current setup (TrOCR’s encoder+Decoder). [link] [comments] |
Looking for guidance. Trying to create a model with TrOCR's encoder + Google's mT5 multilingual decoder but model fails to overfit on a single data sample
Reddit r/LocalLLaMA / 3/26/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- A developer is attempting to build an OCR proof of concept for handwritten and printed Hindi (Devanagari) by combining TrOCR’s vision encoder with a Google mT5 multilingual decoder for Hindi tokenization.
- Despite matching hidden sizes and substituting the decoder, the combined model fails to overfit a single training example, with loss plateauing around 2–3 and outputs degenerating into repeated characters rather than coherent text.
- They report trying typical training adjustments (learning rate changes and repetition penalties) but still cannot achieve overfitting, suggesting a fundamental mismatch or training/labeling issue in the encoder-decoder integration.
- The request asks for guidance on better tokenizer/decoder options compatible with TrOCR’s encoder or for recommendations to fix the current TrOCR encoder + mT5 decoder setup so it can learn Hindi outputs.
- The discussion centers on practical troubleshooting for seq2seq OCR architecture compatibility, tokenization, and decoder conditioning rather than a new model release or result.
Related Articles
Mercor competitor Deccan AI raises $25M, sources experts from India
Dev.to
How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)
Dev.to
How Should Students Document AI Usage in Academic Work?
Dev.to
I built a PWA fitness tracker with AI that supports 86 sports — as a solo developer
Dev.to

I asked my AI agent to design a product launch image. Here's what came back.
Dev.to