EMA Is Not All You Need: Mapping the Boundary Between Structure and Content in Recurrent Context
arXiv cs.AI / 4/13/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper uses exponential moving average (EMA) traces as a controlled probe to determine what fixed-coefficient recurrent context can represent versus what it fundamentally cannot.
- EMA traces are shown to encode temporal structure effectively, with a Hebbian multi-timescale approach reaching 96% of a supervised BiGRU on grammatical role assignment without labels and even outperforming it on structure-dependent roles.
- The study finds EMA traces eliminate token identity, and a 130M-parameter language model relying only on EMA context achieves C4 perplexity 260 (about 8× GPT-2), indicating major limits on content retention.
- A predictor ablation (replacing a linear predictor with full softmax attention) yields identical loss, localizing the performance gap specifically to information discarded by the traces.
- The authors argue that EMA traces perform lossy, data-independent compression; by the data processing inequality, no downstream predictor can recover discarded information, implying that only learned, input-dependent selection can overcome fixed accumulation’s irreversible dilution.
Related Articles

Why Fashion Trend Prediction Isn’t Enough Without Generative AI
Dev.to

Chatbot vs Voicebot: The Real Business Decision Nobody Talks About
Dev.to

วิธีใช้ AI ทำ SEO ให้เว็บติดอันดับ Google (2026)
Dev.to

Free AI Tools With No Message Limits — The Definitive List (2026)
Dev.to

Why Domain Knowledge Is Critical in Healthcare Machine Learning
Dev.to