CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization

arXiv cs.CL / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes CROP (Cost-Regularized Optimization of Prompts), an automatic prompt optimization method that reduces LLM token usage by explicitly regularizing response length during optimization.
  • Unlike existing APO approaches that optimize only task accuracy, CROP adds feedback based on response length, encouraging the model to output concise answers with only critical reasoning.
  • Experiments on GSM8K, LogiQA, and BIG-Bench Hard show a reported 80.6% reduction in token consumption while maintaining competitive accuracy with only a nominal performance drop.
  • The authors position CROP as a practical technique for deploying token-efficient, cost-effective agentic AI systems in production pipelines where latency and token cost matter.

Abstract

Large Language Models utilizing reasoning techniques improve task performance but incur significant latency and token costs due to verbose generation. Existing automatic prompt optimization(APO) frameworks target task accuracy exclusively at the expense of generating long reasoning traces. We propose Cost-Regularized Optimization of Prompts (CROP), an APO method that introduces regularization on response length by generating textual feedback in addition to standard accuracy feedback. This forces the optimization process to produce prompts that elicit concise responses containing only critical information and reasoning. We evaluate our approach on complex reasoning datasets, specifically GSM8K, LogiQA and BIG-Bench Hard. We achieved an 80.6\% reduction in token consumption while maintaining competitive accuracy, seeing only a nominal decline in performance. This presents a pragmatic solution for deploying token-efficient and cost-effective agentic AI systems in production pipelines.