AI Navigate

Transformers Can Learn Rules They've Never Seen: Proof of Computation Beyond Interpolation

arXiv cs.LG / 3/19/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • It tests whether transformers can infer rules absent from training data, challenging interpolation-only accounts in two controlled experiments.
  • Experiment 1 uses a cellular automaton with an XOR rule and held-out input patterns to show that similarity-based predictors fail, yet a two-layer transformer learns the rule and circuit extraction identifies XOR computation, with multi-step constraint propagation being key.
  • Experiment 2 studies symbolic operator chains over integers with one operator pair held out, requiring intermediate-step proofs; across all 49 holdout pairs, the transformer surpasses every interpolation baseline and degrades without intermediate-step supervision.
  • The work also demonstrates a standard transformer block can implement exact local Boolean rules, providing an existence proof that transformers can learn and communicate unseen rule structures, while leaving open when such behavior arises in large-scale training.

Abstract

A central question in the LLM debate is whether transformers can infer rules absent from training, or whether apparent generalisation reduces to similarity-based interpolation over observed examples. We test a strong interpolation-only hypothesis in two controlled settings: one where interpolation is ruled out by construction and proof, and one where success requires emitting intermediate symbolic derivations rather than only final answers. In Experiment 1, we use a cellular automaton with a pure XOR transition rule and remove specific local input patterns from training; since XOR is linearly inseparable, each held-out pattern's nearest neighbours have the opposite label, so similarity-based predictors fail on the held-out region. Yet a two-layer transformer recovers the rule (best 100%; 47/60 converged runs), and circuit extraction identifies XOR computation. Performance depends on multi-step constraint propagation: without unrolling, accuracy matches output bias (63.1%), while soft unrolling reaches 96.7%. In Experiment 2, we study symbolic operator chains over integers with one operator pair held out; the model must emit intermediate steps and a final answer in a proof-like format. Across all 49 holdout pairs, the transformer exceeds every interpolation baseline (mean 41.8%, up to 78.6%; mean KRR 4.3%; KNN and MLP score 0% on every pair), while removing intermediate-step supervision degrades performance. Together with a construction showing that a standard transformer block can implement exact local Boolean rules, these results provide an existence proof that transformers can learn rule structure not directly observed in training and express it explicitly, ruling out the strongest architectural form of interpolation-only accounts: that transformers cannot in principle discover and communicate unseen rules, while leaving open when such behaviour arises in large-scale language training.