HiCI: Hierarchical Construction-Integration for Long-Context Attention

arXiv cs.CL / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces HiCI (Hierarchical Construction–Integration), a hierarchical attention module that explicitly builds segment-level representations, integrates them into a global context, and then conditions segment-level attention on both.
  • Experiments use parameter-efficient adaptation of LLaMA-2 with under 5.5% additional parameters, extending context length from 4K up to 100K tokens (7B) and 64K tokens (13B).
  • Across language modeling, retrieval, and instruction-following benchmarks, HiCI shows consistent gains over strong baselines, including competitive performance with proprietary models on topic retrieval.
  • The approach is described as adding an inductive bias that makes local-to-global information structuring more explicit for long-context modeling, yielding improvements even versus GPT-3.5-Turbo-16K on code comprehension.
  • Overall results suggest that explicit hierarchical structuring can be an effective architectural direction for long-context attention beyond raw token-level scalability.

Abstract

Long-context language modeling is commonly framed as a scalability challenge of token-level attention, yet local-to-global information structuring remains largely implicit in existing approaches. Drawing on cognitive theories of discourse comprehension, we propose HiCI (Hierarchical Construction--Integration), a hierarchical attention module that constructs segment-level representations, integrates them into a shared global context, and broadcasts both to condition segment-level attention. We validate HiCI through parameter-efficient adaptation of LLaMA-2 with only <5.5% additional parameters, extending context from 4K to 100K tokens (7B) and 64K tokens (13B). Across language modeling, retrieval, and instruction-following benchmarks, HiCI yields consistent improvements over strong baselines, including matching proprietary models on topic retrieval and surpassing GPT-3.5-Turbo-16K on code comprehension. These results demonstrate the effectiveness of explicit hierarchical structuring as an inductive bias for long-context modeling.