C-voting: Confidence-Based Test-Time Voting without Explicit Energy Functions

arXiv cs.LG / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces confidence-based voting (C-voting), a test-time scaling method for recurrent latent models that uses multiple candidate latent trajectories and selects the one maximizing the average of top-1 prediction probabilities as a confidence proxy.
  • It reports that C-voting achieves 4.9% higher accuracy on Sudoku-hard than an energy-based voting strategy, highlighting improved performance over approaches that rely on explicit energy functions.
  • A key contribution is that C-voting can be applied to recurrent models even when they do not have explicit energy functions, making it more broadly compatible with existing model designs.
  • The authors propose an attention-based recurrent variant with randomized initial states (ItrSA++), and show that when combined with C-voting it outperforms HRM on Sudoku-extreme (95.2% vs. 55.0%) and Maze (78.6% vs. 74.5%).

Abstract

Neural network models with latent recurrent processing, where identical layers are recursively applied to the latent state, have gained attention as promising models for performing reasoning tasks. A strength of such models is that they enable test-time scaling, where the models can enhance their performance in the test phase without additional training. Models such as the Hierarchical Reasoning Model (HRM) and Artificial Kuramoto Oscillatory Neurons (AKOrN) can facilitate deeper reasoning by increasing the number of recurrent steps, thereby enabling the completion of challenging tasks, including Sudoku, Maze solving, and AGI benchmarks. In this work, we introduce confidence-based voting (C-voting), a test-time scaling strategy designed for recurrent models with multiple latent candidate trajectories. Initializing the latent state with multiple candidates using random variables, C-voting selects the one maximizing the average of top-1 probabilities of the predictions, reflecting the model's confidence. Additionally, it yields 4.9% higher accuracy on Sudoku-hard than the energy-based voting strategy, which is specific to models with explicit energy functions. An essential advantage of C-voting is its applicability: it can be applied to recurrent models without requiring an explicit energy function. Finally, we introduce a simple attention-based recurrent model with randomized initial values named ItrSA++, and demonstrate that when combined with C-voting, it outperforms HRM on Sudoku-extreme (95.2% vs. 55.0%) and Maze (78.6% vs. 74.5%) tasks.