Robust Regression with Adaptive Contamination in Response: Optimal Rates and Computational Barriers

arXiv stat.ML / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies robust regression when covariates are clean but responses can be adaptively corrupted, contrasting this with Huber’s classical contamination model.
  • It shows that clean covariate information enables strictly improved statistical estimation rates over the Huber setting and, unlike Huber contamination, can yield consistency even when the contamination fraction is a non-vanishing constant.
  • The authors prove a matching minimax lower bound using Fano’s inequality and contamination process constructions that generalize earlier two-point arguments to handle multiple distributions.
  • Even though the information-theoretic rate improves over Huber’s model, the paper establishes strong information–computation gaps via Statistical Query and Low-Degree Polynomial lower bounds, implying polynomial-time algorithms may not achieve the optimal information-theoretic performance.

Abstract

We study robust regression under a contamination model in which covariates are clean while the responses may be corrupted in an adaptive manner. Unlike the classical Huber's contamination model, where both covariates and responses may be contaminated and consistent estimation is impossible when the contamination proportion is a non-vanishing constant, it turns out that the clean-covariate setting admits strictly improved statistical guarantees. Specifically, we show that the additional information in the clean covariates can be carefully exploited to construct an estimator that achieves a better estimation rate than that attainable under Huber contamination. In contrast to the Huber model, this improved rate implies consistency even when the contamination is a constant. A matching minimax lower bound is established using Fano's inequality together with the construction of contamination processes that match m> 2 distributions simultaneously, extending the previous two-point lower bound argument in Huber's setting. Despite the improvement over the Huber model from an information-theoretic perspective, we provide formal evidence -- in the form of Statistical Query and Low-Degree Polynomial lower bounds -- that the problem exhibits strong information-computation gaps. Our results strongly suggest that the information-theoretic improvements cannot be achieved by polynomial-time algorithms, revealing a fundamental gap between information-theoretic and computational limits in robust regression with clean covariates.