TARo: Token-level Adaptive Routing for LLM Test-time Alignment
arXiv cs.CL / 3/20/2026
📰 NewsModels & Research
Key Points
- TARo introduces a token-level adaptive router that steers frozen LLMs toward structured reasoning entirely at inference time by guiding a reward model trained on step-wise mathematical traces.
- The method trains reward models to capture fine-grained logical consistency signals and uses a learnable token-level router to control how the reward model guides the base model.
- Experiments show TARo improves reasoning performance by up to 22.4% over the base model and 8.4% over existing token-level test-time alignment methods, and it generalizes from small to large backbones without retraining.
- TARo also boosts out-of-distribution clinical reasoning (MedXpertQA) and instruction following (AlpacaEval), extending test-time alignment from preference optimization to robust, cross-domain reasoning.
Related Articles

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**
Dev.to

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.
Reddit r/LocalLLaMA

composer 2 is just Kimi K2.5 with RL?????
Reddit r/LocalLLaMA

Built a small free iOS app to reduce LLM answer uncertainty with multiple models
Dev.to
![[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results](/_next/image?url=https%3A%2F%2Fpreview.redd.it%2Flv7w6809f7qg1.png%3Fwidth%3D140%26height%3D75%26auto%3Dwebp%26s%3De77e7b54776d5a33eb092415d26190352ad20577&w=3840&q=75)
[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results
Reddit r/MachineLearning