The Banach-Butterfly Invariant: Influence-Adaptive Walsh Geometry for Ternary Polynomial Threshold Functions

arXiv cs.LG / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces the Banach-Butterfly Invariant (BBT), an influence-adaptive Banach-geometry measure built on the Walsh-Hadamard butterfly factorization that assigns exponents to layers based on each coordinate’s influence.
  • It derives a Jensen-type lower bound for the contraction invariant μ(f) in terms of total influence I(f) and proves μ is strictly Schur-convex (up to permutation) with respect to the influence vector.
  • The authors characterize how μ scales for important Boolean functions, giving asymptotic behaviors such as μ ~ 2^{-n/2} for parity, μ = 2^{-Theta(√n)} for majority, and μ = 2^{-1/2} for dictators.
  • Using small, certified ternary Walsh-threshold testbeds (n ≤ 4) and sampled NPN-canonical representatives (n = 5), they compute MILP minimum-support certificates and show that μ can distinguish functions with the same total influence, but it is not a universally monotone predictor of minimum support across different n.
  • A companion application paper leverages a real-valued WHT “activation-energy” proxy inspired by the theory to improve compression/quantization of LLMs, reducing wikitext-2 perplexity by 15–58% relative to a baseline auto-round approach, though the transfer is qualitative rather than formally proved.

Abstract

We introduce the Banach-Butterfly Invariant (BBT), an influence-adaptive Banach geometry on the Walsh-Hadamard butterfly factorization. For a Boolean function f:\{-1,+1\}^n\to\{-1,+1\} with coordinate influences \mathrm{Inf}_\ell(f), BBT assigns exponent p_\ell = 1+\mathrm{Inf}_\ell(f) to butterfly layer \ell, yielding the contraction invariant \mu(f)=\prod_\ell 2^{-\mathrm{Inf}_\ell/(1+\mathrm{Inf}_\ell)}. We prove a Jensen lower bound \log_2\mu(f) \ge -I(f)/(1+I(f)/n) and that \mu is strictly Schur-convex in the influence vector (modulo permutation), giving scaling classes \mu\sim 2^{-n/2} (parity), 2^{-\Theta(\sqrt{n})} (majority), 2^{-1/2} (dictators). \log_2\mu is rational but not polynomial in the Fourier coefficients while \mu is algebraic, and \mu separates functions with identical total influence (122 pairs at n=3). Using the certified n \le 4 ternary Walsh-threshold universe from a companion synthesis manuscript as a finite testbed, we compute exact MILP minimum-support certificates for all 65,536 Boolean functions at n=4 (mean 6.42, max 9, all-odd by a parity argument) and on 10,000 of the 616,126 NPN-canonical representatives we enumerate at n=5 (matching OEIS A000370). Conditional Spearman \rho(\mu,|\mathrm{supp}|) at fixed total influence is +0.571 in the largest stratum at n=4 but reverses to -0.38 at n=5 under both function-uniform and NPN-canonical sampling: \mu is a valid Schur-convex concentration invariant, not a universal monotone predictor of minimum support across n. A companion application paper validates a real-valued WHT activation-energy proxy inspired by this theory on five pretrained LLMs at W2A16, cutting wikitext-2 perplexity by 15-58% versus vanilla auto-round; the transfer from Boolean theory to the real-valued proxy is qualitative, not formal.