Large Language Models Explore by Latent Distilling

arXiv cs.LG / 4/29/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces Exploratory Sampling (ESamp), a decoding method designed to encourage semantic diversity in LLM outputs rather than relying on superficial lexical variation from standard stochastic sampling.
ESamp trains a lightweight test-time distiller to map shallow-layer representations to deep-layer hidden states, and uses the distiller’s prediction error as a novelty signal during generation.
During decoding, ESamp reweights candidate next-token continuations based on the current prefix and the measured novelty, biasing the model toward less-explored semantic patterns.
The approach uses an asynchronous training–inference pipeline with low overhead (under 5% in the worst case, 1.2% in an optimized release) and improves Pass@k efficiency on reasoning models.
Experiments indicate ESamp generalizes well across mathematics, science, and code generation benchmarks, and it reduces the typical trade-off between diversity and coherence in creative writing.

Abstract

Generating diverse responses is crucial for test-time scaling of large language models (LLMs), yet standard stochastic sampling mostly yields surface-level lexical variation, limiting semantic exploration. In this paper, we propose Exploratory Sampling (ESamp), a decoding approach that explicitly encourages semantic diversity during generation. ESamp is motivated by the well-known observation that neural networks tend to make lower-error predictions on inputs similar to those encountered before, and incur higher prediction error on novel ones. Building on this property, we train a lightweight Distiller at test time to predict deep-layer hidden representations of the LLM from its shallow-layer representations to model the LLM's depth-wise representation transitions. During decoding, the Distiller continuously adapts to the mappings induced by the current generation context. ESamp uses the prediction error as a novelty signal to reweight candidate token extensions conditioned on the current prefix, thereby biasing decoding toward less-explored semantic patterns. ESamp is implemented with an asynchronous training--inference pipeline, with less than 5% worst case overhead (1.2% in the optimized release). Empirical results show that ESamp significantly boosts the Pass@k efficiency of reasoning models, showing superior or comparable performance to strong stochastic and heuristic baselines. Notably, ESamp achieves robust generalization across mathematics, science, and code generation benchmarks and breaks the trade-off between diversity and coherence in creative writing. Our code has released at: https://github.com/LinesHogan/tLLM.

LLMs will be a commodity

Reddit r/artificial

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant

Dev.to

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

7 OpenClaw Money-Making Cases in One Week — and the Hidden Cost Problem Behind Them

Dev.to

Large Language Models Explore by Latent Distilling

Key Points

Abstract

Related Articles

LLMs will be a commodity

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant

Dex lands $5.3M to grow its AI-driven talent matching platform

7 OpenClaw Money-Making Cases in One Week — and the Hidden Cost Problem Behind Them

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer