ILDR: Geometric Early Detection of Grokking
arXiv cs.LG / 4/24/2026
📰 NewsModels & Research
Key Points
- The paper introduces ILDR (Inter/Intra-class Distance Ratio), a geometric metric computed from second-to-last-layer representations to detect “grokking” early, before validation accuracy improves.
- Unlike prior indirect signals such as weight norms or GrokFast’s slow gradient EMA, ILDR shows a clear rise and crosses a fixed threshold (2.5× baseline) ahead of the grokking transition in validation.
- ILDR is derived from Fisher’s linear discriminant criterion, avoids eigendecomposition, and is computationally efficient (O(|C|^2 + N)) while being evaluated only on held-out data to reduce memorization confounds.
- Experiments on modular arithmetic and permutation group composition (S5) show ILDR leads the grokking transition by 9–73% of the training budget, with consistent timing across multiple random seeds and a large reduction in post-transition variance.
- Using ILDR for early stopping cuts average training by 18.6%, and optimizer interventions at the ILDR trigger indicate ILDR reflects representational conditions driving generalization (not just a downstream correlate).
Related Articles

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.
Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)
Dev.to
Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF
Reddit r/LocalLLaMA

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com
Dev.to
DeepSeek-V4 Runs on Huawei Ascend Chips at 85% Utilization — Here's What That Means for AI Infrastructure and Pricing
Dev.to