CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift

arXiv cs.AI / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The paper finds that common DNA hazard screening based on sequence matching can fail catastrophically (up to a 100% false-flag/miss behavior) when hazardous sequences come from taxonomic families missing from the reference set.
  • It proposes “CRC-Screen,” which uses multiple order-derived signals—k-mer Jaccard similarity to known toxins, a trimmed-mean score from a 5-LLM judge panel, and cosine similarity to embedding centroids—then fuses them via a monotone logistic aggregator.
  • Using Conformal Risk Control (CRC), the screener provides statistical guarantees on the expected false-negative rate, certifying E[FNR] ≤ α under a certified miss-rate constraint.
  • Experiments on ten leave-one-taxonomic-family-out folds (UniProt KW-0800 reviewed toxins, α=0.05) show 0% test miss rate on all folds and 0% test false-flag rate on 9 out of 10 folds.
  • The authors conclude that the main bottleneck for procurement-grade guarantees (e.g., α=10^-3) is the amount of calibration data, not the algorithm, and estimate an ~18× larger calibration set is needed relative to their 200-hazard subsample.

Abstract

DNA-synthesis providers screen incoming orders by searching the requested sequence against curated hazard lists. We show that this baseline collapses to a 100% false-flag rate when the hazardous sequence comes from a taxonomic family absent from the reference set: under Conformal Risk Control's certified miss-rate constraint, a low-discrimination signal forces the threshold below the entire test-benign mass. We compose three signals derived from a synthesis order's public annotation: k-mer Jaccard similarity to known toxins, the trimmed-mean score of a five-LLM judge panel, and cosine similarity to clustered embedding centroids. Fused under a monotone logistic aggregator and calibrated by Conformal Risk Control, the resulting screener certifies \mathbb{E}[\mathrm{FNR}] \le \alpha. Across ten leave-one-taxonomic-family-out folds at \alpha=0.05 on UniProt KW-0800 reviewed toxins, the calibrated screener achieves 0% test miss rate on every fold and 0% test false-flag rate on nine of ten folds. The bound's finite-sample slack 1/(n_{\mathrm{cal}}+1) caps the certifiable miss rate at 1.77% on our 200-hazard subsample; reaching procurement-grade \alpha=10^{-3} requires an 18\times larger calibration set, which the full reviewed UniProt KW-0800 corpus is large enough to deliver. The binding constraint on certifiable DNA-synthesis screening is calibration data, not algorithms. Code: https://github.com/najmulhasan-code/crc-screen