Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models

arXiv cs.CL / 5/5/2026

📰 NewsModels & Research

Key Points

  • The paper argues that common prompt-robustness methods (whole-sequence consistency) can miss a key failure mode where outputs drift on critical entities, relations, or conclusions despite looking globally similar.
  • It proposes S$2$R$2$, a segment-level robustness framework for LoRA fine-tuning that decomposes clean vs. perturbed generations into semantic segments and uses an optimal-transport alignment objective.
  • S$2$R$2$ penalizes only the segments with the largest meaning drift, and introduces an adapter-stability regularizer to connect the output objective to model adaptation via LoRA norm control as a proxy.
  • The authors provide PAC-Bayesian reasoning to suggest that limiting adapter growth can improve transfer beyond the perturbations seen during training.
  • Experiments on summarization benchmarks show that S$2$R$2$ improves robustness to typographical noise, deletion, synonym replacement, and paraphrasing while preserving competitive clean performance and improving cross-dataset transfer versus consistency-based baselines.

Abstract

Large language models are sensitive to minor prompt perturbations, yet existing robustness methods usually enforce consistency at the whole-sequence level. This holistic view can hide an important failure mode: a perturbed response may remain globally similar to the clean one while drifting on a critical entity, relation, or conclusion. We introduce S^2R^2, a segment-level framework for robust LoRA fine-tuning. S^2R^2 decomposes clean and perturbed generations into semantic segments, aligns them with an optimal-transport objective, and penalises the segments with the largest meaning drift. To connect this output-side objective with model adaptation, we add an adapter-stability regulariser motivated by segment-level attention reallocation, using LoRA norm control as a tractable proxy for limiting perturbation-amplified evidence shifts. A PAC-Bayesian complexity view further explains why controlling adapter growth may support transfer beyond observed perturbations. Experiments on summarisation benchmarks show that S^2R^2 improves robustness under typographical noise, deletion, synonym replacement, and paraphrasing, while maintaining competitive clean performance and stronger cross-dataset transfer than consistency-based baselines.