Cross-Entropy Is Load-Bearing: A Pre-Registered Scope Test of the K-Way Energy Probe on Bidirectional Predictive Coding
arXiv cs.CL / 4/24/2026
💬 OpinionModels & Research
Key Points
- A pre-registered study tests an earlier claim that the K-way energy probe in predictive coding networks reduces to a monotone function of the log-softmax margin, focusing on how sensitive the reduction is to removing cross-entropy (CE).
- When standard predictive coding is trained without CE (using MSE instead), the probe no longer matches the softmax-based behavior: the probe remains below softmax with a statistically significant negative gap across 10 CIFAR-10 seeds.
- In bidirectional predictive coding (bPC), the probe exceeds softmax for all seeds, but the study’s manipulation check finds bPC does not meaningfully increase latent movement at the matched scale.
- Removing CE alone roughly halves the probe–softmax gap, indicating CE is a key “load-bearing” component; CE training also yields much larger output logit norms than MSE or bPC training.
- Temperature-scaling ablations further decompose the effect: about 66% of the probe–softmax gap comes from logit-scale effects that temperature rescaling can remove, while about 34% reflects a scale-invariant ranking advantage from CE-trained representations.
Related Articles

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.
Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)
Dev.to
Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF
Reddit r/LocalLLaMA

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com
Dev.to
DeepSeek-V4 Runs on Huawei Ascend Chips at 85% Utilization — Here's What That Means for AI Infrastructure and Pricing
Dev.to