Transformer See, Transformer Do: Copying as an Intermediate Step in Learning Analogical Reasoning

arXiv cs.LG / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates how to train transformer models to perform analogical reasoning on letter-string analogy tasks using meta-learning for compositionality (MLC).
  • It finds that adding explicit copying tasks to the training data helps transformers learn to attend to the most informative elements, making the analogies learnable.
  • The authors report improved generalization to new alphabets when training includes more heterogeneous datasets, with their 3-layer encoder-decoder model outperforming many frontier models.
  • The work shows partial generalization to compositions of transformations but limited ability to handle completely novel transformations, and it proposes an algorithmic approximation of the model’s computations.
  • Interpretability analyses indicate the model’s behavior can be steered in controlled ways consistent with the proposed algorithm, offering insights into parallels with human analogical reasoning.

Abstract

Analogical reasoning is a hallmark of human intelligence, enabling us to solve new problems by transferring knowledge from one situation to another. Yet, developing artificial intelligence systems capable of robust human-like analogical reasoning has proven difficult. In this work, we train transformers using Meta-Learning for Compositionality (MLC) on an analogical reasoning task (letter-string analogies) and assess their generalization capabilities. We find that letter-string analogies become learnable when guiding the models to attend to the most informative problem elements induced by including copying tasks in the training data. Furthermore, generalization to new alphabets becomes better when models are trained with more heterogeneous datasets, where our 3-layer encoder-decoder model outperforms most frontier models. The MLC approach also enables some generalization to compositions of trained transformations, but not to completely novel transformations. To understand how the model operates, we identify an algorithm that approximates the model's computations. We verify this using interpretability analyses and show that the model can be steered precisely according to expectations derived from the algorithm. Finally, we discuss implications of our findings for generalization capabilities of larger models and parallels to human analogical reasoning.