LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation
arXiv cs.CL / 4/29/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper finds that traditional lexical overlap metrics like ROUGE and BLEU correlate weakly (or even negatively) with human judgments of summary quality across multiple domains and document lengths.
- Task-specific neural metrics and LLM-based evaluators align much better with human assessments, especially for evaluating linguistic quality.
- Building on these results, it introduces LLM-ReSum, a self-reflective summarization framework that uses an LLM evaluation-and-rewrite loop without any model fine-tuning.
- Experiments across three domains show LLM-ReSum can improve low-quality summaries by up to 33% in factual accuracy and 39% in coverage, with human evaluators preferring the refined summaries in 89% of cases.
- The work also releases PatentSumEval, a new human-annotated benchmark for legal document summarization with 180 expert-evaluated summaries, along with plans to publish code and datasets on GitHub.
