Multi-Perspective Transformers in ARC-AGI-2 Challenge

arXiv cs.LG / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents an approach to solving ARC-AGI-2, a visual reasoning benchmark focused on generalization from few examples and flexible rule application.
  • It uses TinyLM along with test-time fine-tuning techniques, specifically Test-Time-Training (TTT) and Products of Experts (POE), to improve puzzle-solving performance.
  • The reported results show 96.1% accuracy on the training set, but a substantially lower 21.7% on the evaluation set, indicating remaining generalization challenges.
  • The work emphasizes transformer-based, multi-perspective modeling strategies as a pathway toward more human-intuitive visual reasoning systems.
  • The benchmark and methods are positioned as a step in measuring progress toward AGI-like capabilities via interpretable, rule-based visual tasks.

Abstract

ARC-AGI-2 is a benchmark of human-intuitive visual puzzles that measures a machine's ability to generalize from limited examples, interpret symbolic meaning, and flexibly apply rules in varying contexts. In this paper, we discuss our approach to solving the ARC-AGI-2 puzzles with TinyLM, with additional fine-tuning at test time, including Test-Time-Training (TTT) and Products of Experts (POE). Our model achieves 96.1% accuracy on the training set and 21.7% accuracy on the evaluation set.