The Order Is The Message

arXiv cs.LG / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

A controlled arXiv study on modular arithmetic (p=9973) shows that changing only the ordering of training examples—while keeping everything else constant—can raise test accuracy to about 99.5% well within training time, even when the training set covers just 0.3% of the input space.
The paper contrasts two fixed ordering strategies that rapidly achieve high accuracy against an IID ordering baseline that remains near 0.30% after 5,000 epochs, and notes that an adversarial ordering can suppress learning entirely.
It reports that the learned model consistently builds a Fourier representation whose fundamental frequency corresponds to the Fourier dual of the ordering structure, implying the ordering itself carries recoverable information not contained in any single example.
The fundamental frequency behavior generalizes across seeds despite changes in initialization or training set composition, suggesting an ordering-induced inductive bias rather than memorization.
The authors discuss implications for training efficiency and the reinterpretation of “grokking,” while warning about safety risks from a training/effect “channel” that could evade content-level auditing by embedding information in structure rather than explicit content.

Abstract

In a controlled experiment on modular arithmetic (

p = 9973

), varying only example ordering while holding all else constant, two fixed-ordering strategies achieve 99.5\% test accuracy by epochs 487 and 659 respectively from a training set comprising 0.3\% of the input space, well below established sample complexity lower bounds for this task under IID ordering. The IID baseline achieves 0.30\% after 5{,}000 epochs from identical data. An adversarially structured ordering suppresses learning entirely. The generalizing model reliably constructs a Fourier representation whose fundamental frequency is the Fourier dual of the ordering structure, encoding information present in no individual training example, with the same fundamental emerging across all seeds tested regardless of initialization or training set composition. We discuss implications for training efficiency, the reinterpretation of grokking, and the safety risks of a channel that evades all content-level auditing.