Training Transformers as a Universal Computer

arXiv cs.AI / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper shows that a small transformer can learn to carry out programs written in MicroPy, a simplified but computationally universal programming language.
  • Using procedure definitions and a target expression, the model predicts small-step execution with PENCIL scaffolding to keep computation efficient within a limited context window.
  • After training on randomly generated (nonsensical) MicroPy programs, the transformer generalizes to multiple human-written tasks such as bit operations, binary addition/multiplication, and SAT verification/solving.
  • The study reports out-of-distribution generalization, indicating the model can evaluate novel programs beyond its training set drawn from the same overall program distribution.
  • Overall, the results provide empirical evidence that standard transformers can be trained to function as a “universal computer” for computations expressible in MicroPy.

Abstract

We demonstrate that a small transformer can learn to execute programs in MicroPy, a simplified yet computationally universal programming language. Given procedure definitions together with an expression to evaluate, the transformer predicts small-step execution using PENCIL scaffolding for space-efficient execution within a bounded context window. After training on randomly generated, meaningless MicroPy programs, the learned transformer generalizes to various human-written programs including bit copying and flipping, binary addition and multiplication, and SAT verification and solving. We note that the trained model can achieve out-of-distribution generalization; i.e., evaluate novel programs from distribution on programs. Since MicroPy can express any computation, our results provide empirical evidence that a standard transformer can be trained to act as a universal computer.