K-Way Energy Probes for Metacognition Reduce to Softmax in Discriminative Predictive Coding Networks

arXiv cs.LG / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The preprint argues that a commonly used K-way energy probe in predictive coding networks (PCNs) does not provide fundamentally more information than softmax under the standard discriminative predictive-coding formulation.
  • It presents an approximate reduction showing that the K-way energy margin becomes a monotone function of the log-softmax margin plus an untrained residual that does not correlate with correctness.
  • Experiments on CIFAR-10 across six controlled conditions (training variants, latent-dynamics measurement, budget-matched comparisons, Langevin temperature sweeps, and MCPC training) find the K-way energy probe consistently tracks below softmax rather than above it.
  • The authors report that differences among training approaches within the discriminative PC family are extremely small at deterministic evaluation (AUROC_2 differences < 1e-3 for final-state vs trajectory-integrated training), while emphasizing the limited experimental regime.
  • The paper frames the work as a negative result meant to invite replication and outlines scenarios where the reduction may not hold (e.g., bidirectional/prospective/generative PC or non-CE energy formulations).

Abstract

We present this as a negative result with an explanatory mechanism, not as a formal upper bound. Predictive coding networks (PCNs) admit a K-way energy probe in which each candidate class is fixed as a target, inference is run to settling, and the per-hypothesis settled energies are compared. The probe appears to read a richer signal source than softmax, since the per-hypothesis energy depends on the entire generative chain. We argue this appearance is misleading under the standard Pinchetti-style discriminative PC formulation. We present an approximate reduction showing that with target-clamped CE-energy training and effectively-feedforward latent dynamics, the K-way energy margin decomposes into a monotone function of the log-softmax margin plus a residual that is not trained to correlate with correctness. The decomposition predicts that the structural probe should track softmax from below. We test this across six conditions on CIFAR-10: extended deterministic training, direct measurement of latent movement during inference, a post-hoc decoder fairness control on a backpropagation network, a matched-budget PC vs BP comparison, a five-point Langevin temperature sweep, and trajectory-integrated MCPC training. In every condition the probe sat below softmax. The gap was stable across training procedures within the discriminative PC family. Final-state and trajectory-integrated training produced probes whose AUROC_2 values differed by less than 10^-3 at deterministic evaluation. The empirical regime is small: single seed, 2.1M-parameter network, 1280 test images. We frame the result as a preprint inviting replication. We discuss conditions under which the decomposition does not apply (bidirectional PC, prospective configuration, generative PC, non-CE energy formulations) and directions for productive structural probing the analysis does not foreclose.