Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning

arXiv cs.AI / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes the Forward-Forward (FF) learning algorithm’s “goodness function” design choices, focusing on which activations to measure and how to aggregate them layer-wise.
It proposes a sparse “top-k goodness” metric that evaluates only the k most active neurons, achieving a large improvement on Fashion-MNIST versus the standard sum-of-squares (SoS) baseline.
It introduces “entmax-weighted energy,” which learns a soft sparse alternative to hard top-k selection via an alpha-entmax transformation, delivering further accuracy gains.
By combining sparse goodness with a separate label feature forwarding approach (injecting class hypotheses at every layer through dedicated projections), the authors reach 87.1% accuracy on Fashion-MNIST with a 4x2000 architecture.
Controlled experiments across multiple goodness functions, architectures, and sparsity settings suggest that adaptive sparsity (alpha ≈ 1.5) is the most important design factor for FF networks.

Abstract

The Forward-Forward (FF) algorithm is a biologically plausible alternative to backpropagation that trains neural networks layer by layer using a local goodness function to distinguish positive from negative data. Since its introduction, sum-of-squares (SoS) has served as the default goodness function. In this work, we systematically study the design space of goodness functions, investigating both which activations to measure and how to aggregate them. We introduce top-k goodness, which evaluates only the k most active neurons, and show that it substantially outperforms SoS, improving Fashion-MNIST accuracy by 22.6 percentage points. We further introduce entmax-weighted energy, which replaces hard top-k selection with a learnable sparse weighting based on the alpha-entmax transformation, yielding additional gains. Orthogonally, we adopt separate label feature forwarding (FFCL), in which class hypotheses are injected at every layer through a dedicated projection rather than concatenated only at the input. Combining these ideas, we achieve 87.1 percent accuracy on Fashion-MNIST with a 4x2000 architecture, representing a 30.7 percentage point improvement over the SoS baseline while changing only the goodness function and the label pathway. Across controlled experiments covering 11 goodness functions, two architectures, and a sparsity spectrum analysis over both k and alpha, we identify a consistent principle: sparsity in the goodness function is the most important design choice in FF networks. In particular, adaptive sparsity with alpha approximately 1.5 outperforms both fully dense and fully sparse alternatives.