When Do Early-Exit Networks Generalize? A PAC-Bayesian Theory of Adaptive Depth

arXiv cs.LG / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper provides a unified PAC-Bayesian framework to explain the generalization of early-exit (adaptive-depth) neural networks, which exit at intermediate layers for inference speedups.
It derives new entropy-based generalization bounds that scale with exit-depth entropy H(D) and expected depth E[D] rather than the maximum depth K, improving the sample complexity to O((E[D]·d + H(D))/ε^2).
The authors give explicit constructive constants (including a leading coefficient sqrt(2 ln 2) ≈ 1.177) and show conditions where adaptive-depth networks provably outperform fixed-depth models.
The theory is extended beyond strict label independence to ε-approximate policies, making it more applicable to learned routing mechanisms.
Experiments across 6 architectures and 7 benchmarks validate the tightness of the bounds (tightness ratios 1.52–3.87×, p < 0.001) and show that threshold selection guided by the bounds closely matches validation-tuned performance (within 0.1–0.3%).

Abstract

Early-exit neural networks enable adaptive computation by allowing confident predictions to exit at intermediate layers, achieving 2-8

\times

inference speedup. Despite widespread deployment, their generalization properties lack theoretical understanding -- a gap explicitly identified in recent surveys. This paper establishes a unified PAC-Bayesian framework for adaptive-depth networks. (1) Novel Entropy-Based Bounds: We prove the first generalization bounds depending on exit-depth entropy

H(D)

and expected depth

\mathbb{E}[D]

rather than maximum depth

K

, with sample complexity

\mathcal{O}((\mathbb{E}[D] \cdot d + H(D))/\epsilon^2)

. (2) Explicit Constructive Constants: Our analysis yields the leading coefficient

\sqrt{2\ln 2} \approx 1.177

with complete derivation. (3) Provable Early-Exit Advantages: We establish sufficient conditions under which adaptive-depth networks strictly outperform fixed-depth counterparts. (4) Extension to Approximate Label Independence: We relax the label-independence assumption to

\epsilon

-approximate policies, broadening applicability to learned routing. (5) Comprehensive Validation: Experiments across 6 architectures on 7 benchmarks demonstrate tightness ratios of 1.52-3.87

\times

(all

p < 0.001

) versus

>

100

\times

for classical bounds. Bound-guided threshold selection matches validation-tuned performance within 0.1-0.3%.

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Dev.to

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

Dev.to

When Do Early-Exit Networks Generalize? A PAC-Bayesian Theory of Adaptive Depth

Key Points

Abstract

Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer