Defragmenting Language Models: An Interpretability-based Approach for Vocabulary Expansion

arXiv cs.CL / 4/21/2026

📰 NewsModels & Research

Key Points

  • The paper studies “token over-fragmentation” in modern open-weight LLMs, where languages with non-Latin scripts require multiple times more tokens than English to represent the same information.
  • It proposes an interpretability-based vocabulary expansion approach that revisits two key choices: which vocabulary items to add and how to initialize their input/output embeddings.
  • The authors argue against relying solely on frequency-based candidate selection and show interpretability-based methods achieve better performance-to-token-efficiency trade-offs.
  • They report that interpretability-grounded embedding initialization can yield large gains (around 20 points) over baseline initialization methods for several non-Latin-script languages.
  • Based on analysis of “subword detokenization,” the paper introduces FragMend to push efficiency further, validating it with comparisons to strong baselines and extensive ablation-style analysis.

Abstract

All languages are equal; when it comes to tokenization, some are more equal than others. Tokens are the hidden currency that dictate the cost and latency of access to contemporary LLMs. However, many languages written in non-Latin scripts observe a poor exchange rate: LLMs take several multiples of tokens to encode the same information in many languages as they do for English. Our analysis reveals that this issue, known as 'token over-fragmentation', persists in modern open-weight LLMs. The standard remedy is vocabulary expansion that adds target language items missing from the model's vocabulary. In this work, we comprehensively study and advance interpretability-based vocabulary expansion, a new research direction. We focus on two core decisions in the vocabulary expansion process: What items should we add? and How should we initialize their corresponding input and output embeddings? First, we question the conventional use of frequency-based methods to choose candidate vocabulary items to add (a decision long treated as settled), and show that interpretability-based methods offer a superior performance-token efficiency trade-off. Next, we strengthen the case for interpretability-based embedding initialization by showing large gains (~20 pts) over baseline initialization methods for several languages written in non-Latin scripts. We identify the phenomenon of "subword detokenization" where models progressively merge fragmented subword tokens into larger subwords across layers. Grounded in our analysis of this phenomenon, we propose FragMend to further push the efficiency ceiling of interpretability-based expansion. We validate the effectiveness of FragMend through comparison against strong baselines and we present extensive analysis of its design choices.