AI Navigate

ConCISE: A Reference-Free Conciseness Evaluation Metric for LLM-Generated Answers

arXiv cs.CL / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents a reference-free metric to evaluate the conciseness of LLM-generated answers without relying on gold-standard references.
  • It measures conciseness using three components: compression against abstractive summaries, compression against extractive summaries, and a word-removal compression score derived from how many non-essential words an LLM can remove while preserving meaning.
  • The metric is designed to identify redundancy in LLM outputs and help reduce token costs in conversational AI systems.
  • Experimental results indicate the approach effectively detects redundancy and provides a practical, automated tool for briefness evaluation without ground-truth annotations.

Abstract

Large language models (LLMs) frequently generate responses that are lengthy and verbose, filled with redundant or unnecessary details. This diminishes clarity and user satisfaction, and it increases costs for model developers, especially with well-known proprietary models that charge based on the number of output tokens. In this paper, we introduce a novel reference-free metric for evaluating the conciseness of responses generated by LLMs. Our method quantifies non-essential content without relying on gold standard references and calculates the average of three calculations: i) a compression ratio between the original response and an LLM abstractive summary; ii) a compression ratio between the original response and an LLM extractive summary; and iii) wordremoval compression, where an LLM removes as many non-essential words as possible from the response while preserving its meaning, with the number of tokens removed indicating the conciseness score. Experimental results demonstrate that our proposed metric identifies redundancy in LLM outputs, offering a practical tool for automated evaluation of response brevity in conversational AI systems without the need for ground truth human annotations.