AI Navigate

NeuroGame Transformer: Gibbs-Inspired Attention Driven by Game Theory and Statistical Physics

arXiv cs.AI / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The NeuroGame Transformer introduces a dual perspective on attention by treating tokens as players in a cooperative game and as interacting spins in a Gibbs-based physical system.
  • It uses Shapley values for global attribution and Banzhaf indices for local influence, combined via a learnable gate to form an external magnetic field that modulates attention.
  • Pairwise interactions are captured by an Ising-like energy with attention weights emerging as marginal probabilities under a Gibbs distribution, computed efficiently via mean-field equations.
  • To scale to long sequences, the method employs importance-weighted Monte Carlo estimators with Gibbs-distributed weights and provides theoretical convergence and a fairness-sensitivity trade-off controlled by an interpolation parameter.
  • Experimental results on SNLI and MNLI-matched show strong performance, surpassing ALBERT-Base and remaining highly competitive with RoBERTa-Base, with code released on GitHub.

Abstract

Standard attention mechanisms in transformers are limited by their pairwise formulation, which hinders the modeling of higher-order dependencies among tokens. We introduce the NeuroGame Transformer (NGT) to overcome this by reconceptualizing attention through a dual perspective: tokens are treated simultaneously as players in a cooperative game and as interacting spins in a statistical physics system. Token importance is quantified using two complementary game-theoretic concepts -- Shapley values for global, permutation-based attribution and Banzhaf indices for local, coalition-level influence. These are combined via a learnable gating parameter to form an external magnetic field, while pairwise interaction potentials capture synergistic relationships. The system's energy follows an Ising Hamiltonian, with attention weights emerging as marginal probabilities under the Gibbs distribution, efficiently computed via mean-field equations. To ensure scalability despite the exponential coalition space, we develop importance-weighted Monte Carlo estimators with Gibbs-distributed weights. This approach avoids explicit exponential factors, ensuring numerical stability for long sequences. We provide theoretical convergence guarantees and characterize the fairness-sensitivity trade-off governed by the interpolation parameter. Experimental results demonstrate that the NeuroGame Transformer achieves strong performance across SNLI, and MNLI-matched, outperforming some major efficient transformer baselines. On SNLI, it attains a test accuracy of 86.4\% (with a peak validation accuracy of 86.6\%), surpassing ALBERT-Base and remaining highly competitive with RoBERTa-Base. Code is available at https://github.com/dbouchaffra/NeuroGame-Transformer.