The Banach-Butterfly Invariant: Influence-Adaptive Walsh Geometry for Ternary Polynomial Threshold Functions

arXiv cs.LG / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces the Banach-Butterfly Invariant (BBT), an influence-adaptive Banach-geometry measure built on the Walsh-Hadamard butterfly factorization that assigns exponents to layers based on each coordinate’s influence.
It derives a Jensen-type lower bound for the contraction invariant μ(f) in terms of total influence I(f) and proves μ is strictly Schur-convex (up to permutation) with respect to the influence vector.
The authors characterize how μ scales for important Boolean functions, giving asymptotic behaviors such as μ ~ 2^{-n/2} for parity, μ = 2^{-Theta(√n)} for majority, and μ = 2^{-1/2} for dictators.
Using small, certified ternary Walsh-threshold testbeds (n ≤ 4) and sampled NPN-canonical representatives (n = 5), they compute MILP minimum-support certificates and show that μ can distinguish functions with the same total influence, but it is not a universally monotone predictor of minimum support across different n.
A companion application paper leverages a real-valued WHT “activation-energy” proxy inspired by the theory to improve compression/quantization of LLMs, reducing wikitext-2 perplexity by 15–58% relative to a baseline auto-round approach, though the transfer is qualitative rather than formally proved.

Abstract

We introduce the Banach-Butterfly Invariant (BBT), an influence-adaptive Banach geometry on the Walsh-Hadamard butterfly factorization. For a Boolean function

f:\{-1,+1\}^n\to\{-1,+1\}

with coordinate influences

\mathrm{Inf}_\ell(f)

, BBT assigns exponent

p_\ell = 1+\mathrm{Inf}_\ell(f)

to butterfly layer

\ell

, yielding the contraction invariant

\mu(f)=\prod_\ell 2^{-\mathrm{Inf}_\ell/(1+\mathrm{Inf}_\ell)}

. We prove a Jensen lower bound

\log_2\mu(f) \ge -I(f)/(1+I(f)/n)

and that

\mu

is strictly Schur-convex in the influence vector (modulo permutation), giving scaling classes

\mu\sim 2^{-n/2}

(parity),

2^{-\Theta(\sqrt{n})}

(majority),

2^{-1/2}

(dictators).

\log_2\mu

is rational but not polynomial in the Fourier coefficients while

\mu

is algebraic, and

\mu

separates functions with identical total influence (122 pairs at

n=3

). Using the certified

n \le 4

ternary Walsh-threshold universe from a companion synthesis manuscript as a finite testbed, we compute exact MILP minimum-support certificates for all 65,536 Boolean functions at

n=4

(mean 6.42, max 9, all-odd by a parity argument) and on 10,000 of the 616,126 NPN-canonical representatives we enumerate at

n=5

(matching OEIS A000370). Conditional Spearman

\rho(\mu,|\mathrm{supp}|)

at fixed total influence is

+0.571

in the largest stratum at

n=4

but reverses to

-0.38

n=5

under both function-uniform and NPN-canonical sampling:

\mu

is a valid Schur-convex concentration invariant, not a universal monotone predictor of minimum support across

n

. A companion application paper validates a real-valued WHT activation-energy proxy inspired by this theory on five pretrained LLMs at W2A16, cutting wikitext-2 perplexity by 15-58% versus vanilla auto-round; the transfer from Boolean theory to the real-valued proxy is qualitative, not formal.