Kernel Affine Hull Machines for Compute-Efficient Query-Side Semantic Encoding

arXiv cs.LG / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles semantic retrieval settings where the online cost is dominated by query-side transformer encoding, proposing a way to avoid repeated neural inference.
  • It introduces Kernel Affine Hull Machines (KAHMs), which estimate prototype-mixture weights using a rigorously defined RKHS and refine prototypes via normalized least-mean-squares to map cheap lexical features into a frozen teacher embedding space.
  • The method provides an analytically explicit decomposition of encoding error into posterior approximation, generalization, and teacher-noise components, improving interpretability.
  • On an Austrian-law benchmark (5,000 queries), KAHMs match or outperform comparable learned query adapters in teacher-space reconstruction (MSE 0.000091, R² 0.9071, cosine 0.9536).
  • KAHMs also improve rank-based retrieval metrics (MRR@20 0.504, Hit@20 0.694, Top-1 0.411) and cut per-query latency by 8.5× versus direct transformer encoding.

Abstract

Transformer-based semantic retrieval is highly effective, yet in many deployments the dominant cost lies in online query encoding rather than corpus indexing. We study the fixed-teacher query-adaptation problem and ask whether repeated neural inference can be replaced by a lightweight, analytically explicit estimator without degrading decision-relevant retrieval quality. We propose Kernel Affine Hull Machines (KAHMs), which map inexpensive lexical features into a frozen semantic embedding space by estimating prototype-mixture weights in a rigorously specified RKHS and refining prototypes via normalized least-mean-squares, yielding a transparent decomposition of encoding error into posterior-approximation, generalization, and teacher-noise components. On a controlled Austrian-law benchmark (5,000 queries; 84 laws; 10,762 units), KAHM attains the strongest teacher-space reconstruction among matched learned adapters (MSE 0.000091, R^2 0.9071, cosine 0.9536) and consistently leads rank-sensitive metrics, including mean reciprocal rank at 20 (MRR@20, the average inverse rank of the first relevant result within the top 20), Hit rate at 20 (Hit@20, the fraction of queries with at least one relevant result in the top 20), and Top-1 accuracy (the fraction of queries whose correct item is ranked first), with scores of 0.504, 0.694, and 0.411, respectively. It also reduces per-query latency by a factor of 8.5 relative to direct transformer encoding. These results demonstrate that, in fixed-teacher regimes, lightweight geometric estimators can substitute for online neural encoding, preserving retrieval performance while substantially improving efficiency and interpretability.