ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models

arXiv cs.CL / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces Adaptive Dictionary Embeddings (ADE), a framework that scales multi-anchor word representations—previously too inefficient for large models—into large language model architectures.
ADE’s core components include Vocabulary Projection to replace expensive anchor lookups with efficient matrix operations, Grouped Positional Encoding to share position information among anchors of the same word, and self-attention-based context-aware anchor reweighting.
ADE is integrated into a Segment-Aware Transformer (SAT) to perform context-aware anchor weighting during inference.
On AG News and DBpedia-14, ADE shows strong parameter efficiency (98.7% fewer trainable parameters than DeBERTa-v3-base), surpasses DeBERTa on DBpedia-14, and approaches DeBERTa on AG News while compressing the embedding layer by over 40×.
Overall, the results suggest multi-anchor representations can be a practical, parameter-efficient alternative to single-vector word embeddings in modern transformers.

Abstract

Word embeddings are fundamental to natural language processing, yet traditional approaches represent each word with a single vector, creating representational bottlenecks for polysemous words and limiting semantic expressiveness. While multi-anchor representations have shown promise by representing words as combinations of multiple vectors, they have been limited to small-scale models due to computational inefficiency and lack of integration with modern transformer architectures. We introduce Adaptive Dictionary Embeddings (ADE), a framework that successfully scales multi-anchor word representations to large language models. ADE makes three key contributions: (1) Vocabulary Projection (VP), which transforms the costly two-stage anchor lookup into a single efficient matrix operation; (2) Grouped Positional Encoding (GPE), a novel positional encoding scheme where anchors of the same word share positional information, preserving semantic coherence while enabling anchor-level variation; and (3) context-aware anchor reweighting, which leverages self-attention to dynamically compose anchor contributions based on sequence context. We integrate these components into the Segment-Aware Transformer (SAT), which provides context-aware reweighting of anchor contributions at inference time. We evaluate ADE on AG News and DBpedia-14 text classification benchmarks. With 98.7% fewer trainable parameters than DeBERTa-v3-base, ADE surpasses DeBERTa on DBpedia-14 (98.06% vs. 97.80%) and approaches it on AG News (90.64% vs. 94.50%), while compressing the embedding layer over 40x -- demonstrating that multi-anchor representations are a practical and parameter-efficient alternative to single-vector embeddings in modern transformer architectures.

LLMs will be a commodity

Reddit r/artificial

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant

Dev.to

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

7 OpenClaw Money-Making Cases in One Week — and the Hidden Cost Problem Behind Them

Dev.to

ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models

Key Points

Abstract

Related Articles

LLMs will be a commodity

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant

Dex lands $5.3M to grow its AI-driven talent matching platform

7 OpenClaw Money-Making Cases in One Week — and the Hidden Cost Problem Behind Them

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer