Abstract
We measure how much one extra recurrence is worth to a looped (depth-recurrent) language model, in equivalent unique parameters. From an iso-depth sweep of 116 pretraining runs across recurrence counts r \in \{1, 2, 4, 8\} spanning {\sim}50\times in training compute, we fit a joint scaling law L = E + A\,(N_\text{once} + r^{\varphi} N_\text{rec})^{-\alpha} + B\,D^{-\beta} and recover a new recurrence-equivalence exponent \varphi = 0.46 at R^2 = 0.997. Intuitively, \varphi tells us whether looping a block r times is equivalent in validation loss to r unique blocks of a non-looped model (full equivalence, \varphi{=}1) or to a single block run repeatedly with no capacity gain (\varphi{=}0). Our \varphi = 0.46 sits in between, so each additional recurrence predictably increases validation loss at matched training compute. For example, at r{=}4 a 410M looped model performs on par with a 580M non-looped model, but pays the training cost of a 1B non-looped one. On a five-axis downstream evaluation, the gap persists on parametric-knowledge tasks and closes on simple open-book tasks, while reasoning tasks are not resolvable at our compute budgets. For any looped LM, our \varphi converts the design choice of r into a predictable validation-loss cost, and future training recipes and architectures can be compared by how much they raise \varphi above 0.46.