Are Finer Citations Always Better? Rethinking Granularity for Attributed Generation

arXiv cs.CL / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that citation granularity (sentence vs paragraph vs document) is a key design lever for attributed generation, but that simply choosing finer citations for human verifiability is not necessarily optimal for model performance.
Across four model scales (8B–120B), enforcing fine-grained (sentence-level) citations significantly reduces attribution quality—by 16% to 276% compared with the best-performing granularity setting.
The study finds a consistent optimum at intermediate granularity, with paragraph-level citations producing the highest attribution quality, while overly coarse citations add distracting noise.
The performance penalty for fine-grained constraints varies non-monotonically with model scale, with larger models being disproportionately harmed—suggesting sentence-level “atomic” citation units interfere with the multi-sentence semantic synthesis these models rely on.
It concludes that improving attribution requires aligning citation granularity with the model’s natural semantic scope, and that citation-optimal granularity can substantially improve attribution while preserving or even improving answer correctness.

Abstract

Citation granularity - whether to cite individual sentences, paragraphs, or documents - is a critical design choice in attributed generation. While fine-grained citations are often preferred for precise human verification, their impact on model performance remains under-explored. We analyze four model scales (8B-120B) and demonstrate that enforcing fine-grained citations degrades attribution quality by 16-276% compared to the best-performing granularity. We observe a consistent performance pattern where attribution quality peaks at intermediate granularities (paragraph-level). Our analysis suggests that fine-grained (sentence-level) citations disrupt necessary semantic dependencies for attributing evidence to answer claims, while excessively coarse citations (multi-paragraph) introduce distracting noise. Importantly, the magnitude of this performance gap varies non-monotonically with model scale: fine-grained constraints disproportionately penalize larger models, suggesting that atomic citation units disrupt the multi-sentence information synthesis at which these models excel. Strikingly, citation-optimal granularity leads to substantial gains in attribution quality while preserving or even improving answer correctness. Overall, our findings demonstrate that optimizing solely for human verification via fine-grained citation disregards model constraints, compromising both attribution faithfulness and generation reliability. Instead, effective attribution requires aligning citation granularity with the model's natural semantic scope.