https://arcprize.org/arc-agi/3
Interesting stuff, they find all well performing models probably have ARC-like data in their training set based on inspecting their reasoning traces.
Also all frontier models on round 3 are below 1% score. Lots of room for improvement, specially considering prizes have not been claimed for round 1-2 yet (efficiency is still lacking).
[link] [comments]
