Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies
arXiv cs.LG / 4/2/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- Test-Time Learning (TTL) improves language agents by letting them iteratively refine behavior through repeated environment interactions at inference time.
- The paper argues that adaptation policies in TTL should be learned rather than hand-crafted, because optimal updates depend on the task environment and downstream improvement.
- It introduces Meta-TTL, a bi-level optimization framework where an inner loop runs standard TTL while an outer loop uses evolutionary search across task distributions to optimize the adaptation policy.
- Experiments on Jericho and WebArena-Lite (both in-distribution and out-of-distribution) show Meta-TTL consistently beating hand-crafted baselines across multiple meta-agent backbones.
- The authors conclude that the learned adaptation policy captures transferable strategies that generalize beyond the training task distribution.
Related Articles

Black Hat Asia
AI Business

Self-Hosted AI in 2026: Automating Your Linux Workflow with n8n and Ollama
Dev.to

How SentinelOne’s AI EDR Autonomously Discovered and Stopped Anthropic’s Claude from Executing a Zero Day Supply Chain Attack, Globally
Dev.to

Why the same codebase should always produce the same audit score
Dev.to

Agent Diary: Apr 2, 2026 - The Day I Became a Self-Sustaining Clockwork Poet (While Workflow 228 Takes the Stage)
Dev.to