Enhancing Hyperspace Analogue to Language (HAL) Representations via Attention-Based Pooling for Text Classification

arXiv cs.CL / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes integrating a learnable, temperature-scaled additive attention mechanism into HAL representations to improve sentence-level embeddings beyond mean pooling.
  • It addresses sparsity and high dimensionality of HAL co-occurrence matrices by applying truncated SVD to project vectors into a dense latent space before the attention layer.
  • On the IMDB sentiment analysis dataset, the approach achieves 82.38% test accuracy, a 6.74 percentage point improvement over the traditional mean-pooling baseline (75.64%).
  • Qualitative analysis indicates the attention weights suppress stop-words and focus on sentiment-bearing tokens, boosting both performance and interpretability.

Abstract

The Hyperspace Analogue to Language (HAL) model relies on global word co-occurrence matrices to construct distributional semantic representations. While these representations capture lexical relationships effectively, aggregating them into sentence-level embeddings via standard mean pooling often results in information loss. Mean pooling assigns equal weight to all tokens, thereby diluting the impact of contextually salient words with uninformative structural tokens. In this paper, we address this limitation by integrating a learnable, temperature-scaled additive attention mechanism into the HAL representation pipeline. To mitigate the sparsity and high dimensionality of the raw co-occurrence matrices, we apply Truncated Singular Value Decomposition (SVD) to project the vectors into a dense latent space prior to the attention layer. We evaluate the proposed architecture on the IMDB sentiment analysis dataset. Empirical results demonstrate that the attention-based pooling approach achieves a test accuracy of 82.38%, yielding an absolute improvement of 6.74 percentage points over the traditional mean pooling baseline (75.64%). Furthermore, qualitative analysis of the attention weights indicates that the mechanism successfully suppresses stop-words and selectively attends to sentiment-bearing tokens, improving both classification performance and model interpretability.