MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining

arXiv cs.CL / 4/28/2026

📰 NewsModels & Research

Key Points

  • The paper introduces MIPIC, a new unified training framework to learn Matryoshka Representation Learning (MRL) embeddings that remain coherent across different embedding dimensions and model depths.
  • MIPIC uses Self-Distilled Intra-Relational Alignment (SIA) to enforce cross-dimension structural consistency by aligning token-level geometric and attention-driven relations between full and truncated representations via top-k CKA self-distillation.
  • It also applies Progressive Information Chaining (PIC) to consolidate semantics across layers by gradually transferring task understanding from deeper layers to earlier layers.
  • Experiments on STS, NLI, and classification benchmarks (including model sizes ranging from TinyBERT to BGEM3 and Qwen3) show that MIPIC produces strong Matryoshka representations, especially improving performance under extremely low embedding dimensions.
  • Overall, the work addresses the coordination challenge in MRL—how information is arranged across dimensionality and depth—by providing training strategies for both structural alignment and semantic transfer.

Abstract

Representation learning is fundamental to NLP, but building embeddings that work well at different computational budgets is challenging. Matryoshka Representation Learning (MRL) offers a flexible inference paradigm through nested embeddings; however, learning such structures requires explicit coordination of how information is arranged across embedding dimensionality and model depth. In this work, we propose MIPIC (Matryoshka Representation Learning via Self-Distilled Intra-Relational Alignment and Progressive Information Chaining), a unified training framework designed to produce structurally coherent and semantically compact Matryoshka representations. MIPIC promotes cross-dimensional structural consistency through Self-Distilled Intra-Relational Alignment (SIA), which aligns token-level geometric and attention-driven relations between full and truncated representations using top-k CKA self-distillation. Complementarily, it enables depth-wise semantic consolidation via Progressive Information Chaining (PIC), a scaffolded alignment strategy that incrementally transfers mature task semantics from deeper layers into earlier layers. Extensive experiments on STS, NLI, and classification benchmarks (spanning models from TinyBERT to BGEM3, Qwen3) demonstrate that MIPIC yields Matryoshka representations that are highly competitive across all capacities, with significant performance advantages observed under extreme low-dimensional.