Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

arXiv cs.LG / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

It introduces the Neural Uncertainty Principle (NUP), proposing a shared, loss-driven bound that explains adversarial fragility in vision and hallucination in LLMs as arising from the same uncertainty budget between input and its gradient.
In near-bound regimes, additional compression increases sensitivity dispersion (adversarial fragility) and weak prompt-gradient coupling makes generation under-constrained (hallucination).
The bound is modulated by an input-gradient correlation channel, detectable via a specifically designed single-backward probe that serves as a risk signal.
To improve robustness without adversarial training, the paper proposes ConjMask (masking high-contribution input components) and LogitReg (logit-side regularization), plus using the probe for decoding-free hallucination risk detection and prompt selection in LLMs.
Overall, NUP provides a unified, practical framework for diagnosing and mitigating boundary anomalies across perception and generation tasks, with implications for robust model design and evaluation.

Abstract

Adversarial vulnerability in vision and hallucination in large language models are conventionally viewed as separate problems, each addressed with modality-specific patches. This study first reveals that they share a common geometric origin: the input and its loss gradient are conjugate observables subject to an irreducible uncertainty bound. Formalizing a Neural Uncertainty Principle (NUP) under a loss-induced state, we find that in near-bound regimes, further compression must be accompanied by increased sensitivity dispersion (adversarial fragility), while weak prompt-gradient coupling leaves generation under-constrained (hallucination). Crucially, this bound is modulated by an input-gradient correlation channel, captured by a specifically designed single-backward probe. In vision, masking highly coupled components improves robustness without costly adversarial training; in language, the same prefill-stage probe detects hallucination risk before generating any answer tokens. NUP thus turns two seemingly separate failure taxonomies into a shared uncertainty-budget view and provides a principled lens for reliability analysis. Guided by this NUP theory, we propose ConjMask (masking high-contribution input components) and LogitReg (logit-side regularization) to improve robustness without adversarial training, and use the probe as a decoding-free risk signal for LLMs, enabling hallucination detection and prompt selection. NUP thus provides a unified, practical framework for diagnosing and mitigating boundary anomalies across perception and generation tasks.

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Mistral AI Blog

Anyone who has any common sense knows that AI agents in marketing just don’t exist.

Dev.to

How to Use MiMo V2 API for Free in 2026: Complete Guide

Dev.to

The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context

Dev.to

From Chaos to Compliance: AI Automation for the Mobile Kitchen

Dev.to

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Key Points

Abstract

Related Articles

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Anyone who has any common sense knows that AI agents in marketing just don’t exist.

How to Use MiMo V2 API for Free in 2026: Complete Guide

The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context

From Chaos to Compliance: AI Automation for the Mobile Kitchen

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer