Defend: Automated Rebuttals for Peer Review with Minimal Author Guidance

arXiv cs.AI / 3/31/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces DEFEND, an author-in-the-loop LLM tool for generating rebuttals in peer review that emphasizes structured reasoning rather than fully free-form writing.
  • The authors find that direct LLM-based rebuttal generation often fails on factual correctness and targeted refutation, requiring better controls to keep outputs grounded.
  • DEFEND is compared with three baselines (direct rebuttal generation, segment-wise generation, and a sequential segment-wise approach without author intervention), with DEFEND and author-in-the-loop methods performing substantially better.
  • To support fine-grained evaluation, the work extends the ReviewCritique dataset with new annotations for review segmentation, deficiency/error types, rebuttal-action labels, and mappings to gold rebuttal segments.
  • Experimental results plus a user study indicate that segment-wise generation with minimal author intervention reduces author cognitive load while improving refutation quality.

Abstract

Rebuttal generation is a critical component of the peer review process for scientific papers, enabling authors to clarify misunderstandings, correct factual inaccuracies, and guide reviewers toward a more accurate evaluation. We observe that Large Language Models (LLMs) often struggle to perform targeted refutation and maintain accurate factual grounding when used directly for rebuttal generation, highlighting the need for structured reasoning and author intervention. To address this, in the paper, we introduce DEFEND an LLM based tool designed to explicitly execute the underlying reasoning process of automated rebuttal generation, while keeping the author-in-the-loop. As opposed to writing the rebuttals from scratch, the author needs to only drive the reasoning process with minimal intervention, leading an efficient approach with minimal effort and less cognitive load. We compare DEFEND against three other paradigms: (i) Direct rebuttal generation using LLM (DRG), (ii) Segment-wise rebuttal generation using LLM (SWRG), and (iii) Sequential approach (SA) of segment-wise rebuttal generation without author intervention. To enable finegrained evaluation, we extend the ReviewCritique dataset, creating review segmentation, deficiency, error type annotations, rebuttal-action labels, and mapping to gold rebuttal segments. Experimental results and a user study demonstrate that directly using LLMs perform poorly in factual correctness and targeted refutation. Segment-wise generation and the automated sequential approach with author-in-the-loop, substantially improve factual correctness and strength of refutation.