ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization
arXiv cs.CL / 4/14/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper introduces ReFEree, a reference-free, fine-grained evaluation method for factual consistency in real-world code summarization by handling multi-sentence functionality and dependency context.
- ReFEree defines code-summary-specific factual inconsistency criteria and evaluates them at a segment level using dependency information, then aggregates segment results into a fine-grained score.
- The authors construct a benchmark for code summarization with human-annotated factual consistency labels to support evaluation and comparison.
- Experimental results show ReFEree achieves the highest correlation with human judgment among 13 baselines, improving 15–18% over the prior state of the art, and the code/data are released publicly.

