In-Place Test-Time Training
arXiv cs.LG / 4/8/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that the traditional “train then deploy” approach prevents LLMs from dynamically adapting to new information during real-world usage, motivating Test-Time Training (TTT).
- It introduces “In-Place Test-Time Training” by using the final projection matrix inside MLP blocks as fast, adaptable weights in a way that is designed to be a drop-in enhancement for existing LLM architectures.
- The authors replace TTT’s generic reconstruction goal with a next-token-prediction-aligned objective tailored to autoregressive language modeling, aiming to fix misalignment issues that hurt practical performance.
- An efficient chunk-wise update mechanism is proposed to improve computational efficiency and to maintain compatibility with context parallelism for scalability.
- Experiments show that applying this method can improve a 4B-parameter model on tasks with context lengths up to 128k, and training from scratch also yields consistent gains over related TTT approaches, supporting the framework as a step toward continual learning in LLMs.
Related Articles

Black Hat Asia
AI Business
[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project
Reddit r/MachineLearning
ALTK‑Evolve: On‑the‑Job Learning for AI Agents
Hugging Face Blog
Context Windows Are Getting Absurd — And That's a Good Thing
Dev.to
Every AI Agent Registry in 2026, Compared
Dev.to