Improving LLM Predictions via Inter-Layer Structural Encoders

arXiv cs.CL / 3/25/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLM predictions need not rely solely on final-layer token representations because intermediate layers can hold more task-relevant information for certain tasks.
  • It proposes Inter-Layer Structural Encoders (ILSE), which learns a single effective representation by combining internal representations from multiple layers of an LLM.
  • ILSE’s key component, Cayley-Encoder, uses expander Cayley graphs as a geometric, mathematically grounded mechanism to efficiently propagate structural information across layers.
  • Across 13 classification and semantic similarity tasks using 9 pre-trained LLMs (14M to 8B parameters), ILSE reportedly improves accuracy by up to 44% and similarity metrics by up to 25% versus baselines and prior methods.
  • The method is shown to be data-efficient in few-shot settings and can help smaller models compete with much larger ones.

Abstract

The standard practice in Large Language Models (LLMs) is to base predictions on the final-layer token representations. Recent studies, however, show that intermediate layers encode substantial information, which may contain more task-relevant features than the final-layer representations alone. Importantly, it was shown that for different tasks, different layers may be optimal. In this work we introduce Inter-Layer Structural Encoders (ILSE), a powerful structural approach to learn one effective representation from the LLM's internal layer representations all together. Central to ILSE is Cayley-Encoder, a mathematically grounded geometric encoder that leverages expander Cayley graphs for efficient inter-layer information propagation. We evaluate ILSE across 13 classification and semantic similarity tasks with 9 pre-trained LLMs ranging from 14 million to 8 billion parameters. ILSE consistently outperforms baselines and existing approaches, achieving up to 44% improvement in accuracy and 25% in similarity metrics. We further show that ILSE is data-efficient in few-shot regimes and can make small LLMs competitive with substantially larger models.