TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly
arXiv cs.LG / 3/23/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The paper introduces TTQ, a test-time quantization framework that compresses large foundation models on the fly during inference without requiring retraining.
- It employs online calibration to achieve activation-aware quantization that adapts to every prompt and downstream task, reducing domain-shift issues.
- TTQ enables inference speedups by quantizing activations at runtime while maintaining or improving performance compared with state-of-the-art baselines.
- The authors conduct experiments showing TTQ outperforms existing activation- and calibration-based quantization methods on large models.




