Counterfactual Peptide Editing for Causal TCR--pMHC Binding Inference

arXiv cs.LG / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

TCR–pMHC binding prediction models can suffer from shortcut learning, relying on spurious dataset correlations rather than the physical binding interface, which hurts generalization in tougher evaluation settings.
The paper proposes Counterfactual Invariant Prediction (CIP), which creates biologically constrained counterfactual peptide edits and trains models to remain invariant to changes at non-anchor positions while becoming sensitive to disruptions at MHC anchor residues.
CIP improves out-of-distribution performance on a curated VDJdb-IEDB benchmark, reaching an AUROC of 0.831 and a counterfactual consistency (CFC) of 0.724 under a family-held-out protocol.
Compared with an unconstrained baseline, CIP reduces the shortcut index by 39.7%, and ablation results indicate that anchor-aware edit generation is the key driver of the OOD gains.
The authors frame CIP as a practical recipe for causally grounded TCR specificity modeling rather than purely correlation-based prediction.

Abstract

Neural models for TCR-pMHC binding prediction are susceptible to shortcut learning: they exploit spurious correlations in training data -- such as peptide length bias or V-gene co-occurrence -- rather than the physical binding interface. This renders predictions brittle under family-held-out and distance-aware evaluation, where such shortcuts do not transfer. We introduce \emph{Counterfactual Invariant Prediction} (CIP), a training framework that generates biologically constrained counterfactual peptide edits and enforces invariance to edits at non-anchor positions while amplifying sensitivity at MHC anchor residues. CIP augments the base classifier with two auxiliary objectives: (1) an invariance loss penalizing prediction changes under conservative non-anchor substitutions, and (2) a contrastive loss encouraging large prediction changes under anchor-position disruptions. Evaluated on a curated VDJdb-IEDB benchmark under family-held-out, distance-aware, and random splits, CIP achieves AUROC 0.831 and counterfactual consistency (CFC) 0.724 under the challenging family-held-out protocol -- a 39.7\% reduction in shortcut index relative to the unconstrained baseline. Ablations confirm that anchor-aware edit generation is the dominant driver of OOD gains, providing a practical recipe for causally-grounded TCR specificity modeling.

Black Hat Asia

AI Business

The AI Hype Cycle Is Lying to You About What to Learn

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

OpenAI Codex April 2026 Update Review: Computer Use, Memory & 90+ Plugins — Is the Hype Real?

Dev.to

Factory hits $1.5B valuation to build AI coding for enterprises

TechCrunch

Counterfactual Peptide Editing for Causal TCR--pMHC Binding Inference

Key Points

Abstract

Related Articles

Black Hat Asia

The AI Hype Cycle Is Lying to You About What to Learn

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

OpenAI Codex April 2026 Update Review: Computer Use, Memory & 90+ Plugins — Is the Hype Real?

Factory hits $1.5B valuation to build AI coding for enterprises

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer