Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

arXiv cs.LG / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies whether the classic universal approximation results for differentiable functions and their gradients still hold when neural networks run under real floating-point arithmetic and round-off error.
It proves that for a given floating-point function (such as a loss function) and any desired floating-point-valued outputs and input gradients, there exists a floating-point neural network f whose automatically differentiated quantity D^AD(φ∘f) can represent those gradients.
The authors extend the result to multiple functions φ1,…,φn, showing that D^AD(φi∘f) can simultaneously realize arbitrary gradients for each i while f represents the target function values under mild assumptions.
The theoretical guarantees apply to commonly used practical activation functions, including ReLU, ELU, GeLU, Swish, Sigmoid, and tanh.
Overall, the work provides a floating-point-and-automatic-differentiation analogue of universal approximation for both function values and gradients, bridging a gap between idealized real-parameter theory and implementable numerical computation.

Abstract

Theoretical studies show that for any differentiable function on a compact domain, there exists a neural network that approximates both the function values and gradients. However, such a result cannot be used in practice since it assumes real parameters and exact internal operations. In contrast, real implementations only use a finite subset of reals and machine operations with round-off errors. In this work, we investigate whether a similar result holds for neural networks under floating-point arithmetic, when the gradient with respect to the input is computed by the automatic differentiation algorithm

D^\mathtt{AD}

. We first show that given a floating-point function

\phi

(e.g., a loss function), arbitrary function values and gradients can be represented by a floating-point network

f

and

D^\mathtt{AD}(\phi\circ f)

, respectively. We further extend this result: given

\phi_1,\dots,\phi_n

D^\mathtt{AD}(\phi_i\circ f)

can simultaneously represent arbitrary gradients while

f

represents the target values, under mild conditions. Our results hold for practical activation functions, e.g.,

\mathrm{ReLU}

\mathrm{ELU}

\mathrm{GeLU}

\mathrm{Swish}

\mathrm{Sigmoid}

, and

\mathrm{tanh}

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF

Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026

Dev.to

When a Bottling Line Stops at 2 A.M., the Agent That Wins Is the One That Finds the Right Replacement Part

Dev.to

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry

Dev.to

Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

Key Points

Abstract

Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF

10 Ways AI Has Become Your Invisible Daily Companion in 2026

When a Bottling Line Stops at 2 A.M., the Agent That Wins Is the One That Finds the Right Replacement Part

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer