A Boltzmann-machine-enhanced Transformer For DNA Sequence Classification

arXiv cs.AI / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a “Boltzmann-machine-enhanced” Transformer for DNA sequence classification that aims to better capture latent site interactions, combinatorial regulation, and epistasis-like higher-order dependencies than standard softmax attention.
It replaces/augments continuous softmax attention with structured, binary query–key gating variables constrained by a Boltzmann-style energy function, enabling pairwise and higher-order interaction modeling.
Because posterior inference over discrete gating graphs is intractable, the method uses mean-field variational inference to estimate edge activation probabilities and applies Gumbel-Softmax to gradually convert continuous estimates into near-discrete gates.
Training jointly optimizes a classification loss and an energy loss to encourage both predictive accuracy and “low-energy” stable, more interpretable structures, and the authors derive the final objective from variational free energy and mean-field fixed points.
Overall, the work presents a unified framework for combining Boltzmann machines, differentiable discrete optimization, and Transformers to perform structured learning on biological sequence data.

Abstract

DNA sequence classification requires not only high predictive accuracy but also the ability to uncover latent site interactions, combinatorial regulation, and epistasis-like higher-order dependencies. Although the standard Transformer provides strong global modeling capacity, its softmax attention is continuous, dense, and weakly constrained, making it better suited for information routing than explicit structure discovery. In this paper, we propose a Boltzmann-machine-enhanced Transformer for DNA sequence classification. Built on multi-head attention, the model introduces structured binary gating variables to represent latent query-key connections and constrains them with a Boltzmann-style energy function. Query-key similarity defines local bias terms, learnable pairwise interactions capture synergy and competition between edges, and latent hidden units model higher-order combinatorial dependencies. Since exact posterior inference over discrete gating graphs is intractable, we use mean-field variational inference to estimate edge activation probabilities and combine it with Gumbel-Softmax to progressively compress continuous probabilities into near-discrete gates while preserving end-to-end differentiability. During training, we jointly optimize classification and energy losses, encouraging the model to achieve accurate prediction while favoring low-energy, stable, and interpretable structures. We further derive the framework from the energy function and variational free energy to the mean-field fixed-point equations, Gumbel-Softmax relaxation, and the final joint objective. The proposed framework provides a unified view of integrating Boltzmann machines, differentiable discrete optimization, and Transformers for structured learning on biological sequences.