Looking for guidance. Trying to create a model with TrOCR's encoder + Google's mT5 multilingual decoder but model fails to overfit on a single data sample

Reddit r/LocalLLaMA / 3/26/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

A developer is attempting to build an OCR proof of concept for handwritten and printed Hindi (Devanagari) by combining TrOCR’s vision encoder with a Google mT5 multilingual decoder for Hindi tokenization.
Despite matching hidden sizes and substituting the decoder, the combined model fails to overfit a single training example, with loss plateauing around 2–3 and outputs degenerating into repeated characters rather than coherent text.
They report trying typical training adjustments (learning rate changes and repetition penalties) but still cannot achieve overfitting, suggesting a fundamental mismatch or training/labeling issue in the encoder-decoder integration.
The request asks for guidance on better tokenizer/decoder options compatible with TrOCR’s encoder or for recommendations to fix the current TrOCR encoder + mT5 decoder setup so it can learn Hindi outputs.
The discussion centers on practical troubleshooting for seq2seq OCR architecture compatibility, tokenization, and decoder conditioning rather than a new model release or result.

Looking for guidance. Trying to create a model with TrOCR's encoder + Google's mT5 multilingual decoder but model fails to overfit on a single data sample

Hi everyone,

I am working on building a proof of concept for OCR system that can recognize both handwritten and printed Hindi (Devanagari) text in complex documents. I’m trying to build on top of TrOCR (microsoft/trocr-base-handwritten) since it already has a strong vision encoder trained for handwriting recognition.

The core problem I’m running into is on the decoder/tokenizer side — TrOCR’s default decoder and tokenizer are trained for English only, and I need Hindi output.

What I’ve tried so far:

I replaced TrOCR’s decoder with google/mt5-small, which natively supports Hindi tokenization. The hidden sizes matched, so I expected this to work.

However, the model failed to overfit even on a single data point. The loss comes down but hovers at near 2-3 at the end, and the characters keep repeating instead of forming a meaningful word or the sentence. I have tried changing learning rate, introducing repetition penalty but overfitting just don’t happen.

https://preview.redd.it/wh6ucn1mncrg1.png?width=2064&format=png&auto=webp&s=e6cea11021aa84f0d67b74be3a9eb5ffe61c3a74

I need guidance as is their any other tokenizer out there that can work well with TrOCR’s encoder or can you help me improve in this current setup (TrOCR’s encoder+Decoder).

submitted by /u/ElectronicHoneydew86
[link] [comments]

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

Dev.to

How Should Students Document AI Usage in Academic Work?

Dev.to

I built a PWA fitness tracker with AI that supports 86 sports — as a solo developer

Dev.to

I asked my AI agent to design a product launch image. Here's what came back.

Dev.to

Looking for guidance. Trying to create a model with TrOCR's encoder + Google's mT5 multilingual decoder but model fails to overfit on a single data sample

Key Points

Related Articles

Mercor competitor Deccan AI raises $25M, sources experts from India

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

How Should Students Document AI Usage in Academic Work?

I built a PWA fitness tracker with AI that supports 86 sports — as a solo developer

I asked my AI agent to design a product launch image. Here's what came back.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer