Spectral Tempering for Embedding Compression in Dense Passage Retrieval

arXiv cs.AI / 3/23/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper analyzes dimensionality reduction for dense retrieval embeddings and notes that the optimal spectral scaling gamma varies with the target dimensionality k and depends on the signal-to-noise ratio of the retained subspace.
It introduces Spectral Tempering (SpecTemp), a learning-free method that derives an adaptive gamma(k) from the corpus eigenspectrum using local SNR analysis and knee-point normalization without labeled data.
SpecTemp is model-agnostic and aims to match near-oracle performance of grid-searched gamma*(k) while avoiding learning or validation-based hyperparameter tuning.
The authors provide public code and demonstrate through extensive experiments that SpecTemp improves embedding compression in dense passage retrieval with minimal performance loss.

Abstract

Dimensionality reduction is critical for deploying dense retrieval systems at scale, yet mainstream post-hoc methods face a fundamental trade-off: principal component analysis (PCA) preserves dominant variance but underutilizes representational capacity, while whitening enforces isotropy at the cost of amplifying noise in the heavy-tailed eigenspectrum of retrieval embeddings. Intermediate spectral scaling methods unify these extremes by reweighting dimensions with a power coefficient

\gamma

, but treat

\gamma

as a fixed hyperparameter that requires task-specific tuning. We show that the optimal scaling strength

\gamma

is not a global constant: it varies systematically with target dimensionality

k

and is governed by the signal-to-noise ratio (SNR) of the retained subspace. Based on this insight, we propose Spectral Tempering (\textbf{SpecTemp}), a learning-free method that derives an adaptive

\gamma(k)

directly from the corpus eigenspectrum using local SNR analysis and knee-point normalization, requiring no labeled data or validation-based search. Extensive experiments demonstrate that Spectral Tempering consistently achieves near-oracle performance relative to grid-searched

\gamma^*(k)

while remaining fully learning-free and model-agnostic. Our code is publicly available at https://anonymous.4open.science/r/SpecTemp-0D37.

State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

Dev.to

I Built a Zombie Process Killer Because Claude Code Ate 14GB of My RAM

Dev.to

Data Augmentation Using GANs

Dev.to

Building Safety Guardrails for LLM Customer Service That Actually Work in Production

Dev.to

The New AI Agent Primitive: Why Policy Needs Its Own Language (And Why YAML and Rego Fall Short)

Dev.to

Spectral Tempering for Embedding Compression in Dense Passage Retrieval

Key Points

Abstract

Related Articles

State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

I Built a Zombie Process Killer Because Claude Code Ate 14GB of My RAM

Data Augmentation Using GANs

Building Safety Guardrails for LLM Customer Service That Actually Work in Production

The New AI Agent Primitive: Why Policy Needs Its Own Language (And Why YAML and Rego Fall Short)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer