DiffuMask: Diffusion Language Model for Token-level Prompt Pruning

arXiv cs.CL / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

DiffuMask is a diffusion-based framework for token-level prompt pruning that aims to reduce the length (and cost) of prompts used for in-context learning and chain-of-thought reasoning in LLMs.
Unlike prior approaches that remove tokens sequentially, DiffuMask predicts masks in parallel across multiple tokens per denoising step, significantly speeding up the compression process.
The method uses hierarchical pruning signals at both the shot level and token level, with tunable controls to decide how much content to retain.
Experiments report up to 80% prompt length reduction while maintaining or improving accuracy across in-domain, out-of-domain, and even cross-model settings.
Overall, the paper positions DiffuMask as a generalizable, controllable, and faster approach to prompt compression that can make in-context reasoning more efficient and reliable.

Abstract

In-Context Learning and Chain-of-Thought prompting improve reasoning in large language models (LLMs). These typically come at the cost of longer, more expensive prompts that may contain redundant information. Prompt compression based on pruning offers a practical solution, yet existing methods rely on sequential token removal which is computationally intensive. We present DiffuMask, a diffusion-based framework integrating hierarchical shot-level and token-level pruning signals, that enables rapid and parallel prompt pruning via iterative mask prediction. DiffuMask substantially accelerates the compression process via masking multiple tokens in each denoising step. It offers tunable control over retained content, preserving essential reasoning context and achieving up to 80\% prompt length reduction. Meanwhile, it maintains or improves accuracy across in-domain, out-of-domain, and cross-model settings. Our results show that DiffuMask provides a generalizable and controllable framework for prompt compression, facilitating faster and more reliable in-context reasoning in LLMs.

Black Hat Asia

AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

DiffuMask: Diffusion Language Model for Token-level Prompt Pruning

Key Points

Abstract

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer