Built a normalizer so WER stops penalizing formatting differences in STT evals! [P]

Reddit r/MachineLearning / 4/24/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The article describes a problem in STT evaluation where WER is unfairly penalizing purely formatting differences (e.g., “$50” vs “it is fifty dollars” or “3:00PM” vs “3 pm”) despite perfect transcription.
To address this, the authors created a configurable normalization library that normalizes both transcripts before computing WER.
They introduced the open-source project “gladia-normalization,” which runs a YAML-defined, deterministic, version-controllable normalization pipeline (example shown for converting “It’s $50 at 3:00PM” to “it is 50 dollars at 3 pm”).
The library currently includes normalization presets for multiple languages (English, French, German, Italian, Spanish, Dutch), and the team is seeking native speakers to refine non-English behavior.
The project is MIT-licensed and the authors invite others to share how they handle normalization for STT WER evaluation.

Hey guys! At my company, we've been benchmarking STT engines a lot and kept running into the same issue: WER is penalizing formatting differences that have nothing to do with actual recognition quality. "It's $50" vs "it is fifty dollars", "3:00PM" vs "3 pm". Both perfect transcription, but a terrible error rate.

The fix is normalizing both sides before scoring, but every project we had a different script doing it slightly differently. So we built a proper library and open-sourced it.

So we introduced gladia-normalization, where you can run your transcripts through a configurable normalization pipeline before you compute WER

from normalization import load_pipeline pipeline = load_pipeline("gladia-3", language="en") pipeline.normalize("It's $50 at 3:00PM") # => "it is 50 dollars at 3 pm"

Pipelines are YAML-defined so you know exactly what's running and in what order. Deterministic, version-controllable, customizable.

Currently supports English, French, German, Italian, Spanish and Dutch - though we know our non-English presets need refinement and we're actively looking for native speakers to contribute and help get the behavior right for each language 🙌!

MIT licensed, repo here → https://github.com/gladiaio/normalization

Curious how others are handling this. Drop a comment if you've been dealing with the same thing :)

submitted by /u/Karamouche
[link] [comments]