AI Navigate

Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models

arXiv cs.CV / 3/12/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The paper introduces Fuel Gauge, a method to predict the length of the Chain-of-Thought ahead of time in large multimodal models by extracting a hidden 'fuel' signal.
  • It targets memory fragmentation and efficiency in LMM serving by enabling predictive KV cache allocation and by modulating CoT length to balance under- and over-thinking.
  • Extensive experiments across text-only, image-text, and video-text benchmarks show reduced CoT length prediction error and a 13.37x reduction in memory allocation frequency on the GPQA-Diamond benchmark.
  • The results demonstrate generalizability and practical value for real-world LMM deployment, with potential improvements in both resource use and reasoning quality.

Abstract

Reasoning Large Multi-modality Models (LMMs) have become the de facto choice for many applications. However, these models rely on a Chain-of-Thought (CoT) process that is lengthy and unpredictable at runtime, often resulting in inefficient use of computational resources (due to memory fragmentation) and sub-optimal accuracy (due to under- and over-thinking). We observe empirically that the CoT process follows a very simple form, whose behavior is independent of the specific generated samples. This suggests that the CoT length can be estimated ahead of time based on a hidden parameter representing the amount of "fuel" available to support the reasoning process. Based on this insight, we propose Fuel Gauge, the first method which extracts this hidden signal and predicts CoT length ahead of time. We demonstrate the utility on the Fuel Gauge on two downstream tasks: predictive KV cache allocation, which addresses memory fragmentation in LMM serving systems, and CoT length modulation, which mitigates under-thinking and over-thinking. Extensive experiments on LMMs across text-only, image-text, and video-text question answering benchmarks demonstrate the effectiveness, generalizability, and practical value of our Fuel Gauge. For example, on the GPQA-Diamond benchmark, our Fuel Gauge achieves less than half the CoT length prediction error compared to the baseline; this translates into a 13.37x reduction in the memory allocation frequency.