When W4A4 Breaks Camouflaged Object Detection: Token-Group Dual-Constraint Activation Quantization

arXiv cs.CV / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper studies post-training W4A4 (4-bit weights/4-bit activations) quantization for camouflaged object detection (COD) using Transformer-based models, showing a sharp “quantization cliff” that makes aggressive low-bit inference unusually difficult for COD.
  • It identifies the cause as token-local activation bottlenecks where heavy-tailed background tokens dominate a shared activation range, increasing quantization step size and causing weak but meaningful boundary cues to be mapped into the zero bin.
  • To fix this, the authors propose COD-TDQ, a COD-aware Token-group Dual-constraint activation Quantization method using Direct-Sum Token-Group (DSTG) token-group scaling and Dual-Constraint Range Projection (DCRP) to bound both the step-to-dispersion ratio and the zero-bin mass.
  • Experiments on four COD benchmarks with two baseline models (CFRN and ESCNet) show COD-TDQ improves the Sα-score by more than 0.12 over the state of the art quantization approach without retraining, and the code will be released.

Abstract

Camouflaged object detection (COD) segments objects that intentionally blend with the background, so predictions depend on subtle texture and boundary cues. COD is often needed under tight on-device memory and latency budgets, making low-bit inference highly desirable. However, COD is unusually hard to quantify aggressively. We study post-training W4A4 quantization of Transformer-based COD and find a task-specific cliff: heavy-tailed background tokens dominate a shared activation range, inflating the step size and pushing weak-but-structured boundary cues into the zero bin. This exposes a token-local bottleneck -- remove cross-token range domination and bound the zero-bin mass under 4-bit activations. To address this, we introduce COD-TDQ, a COD-aware Token-group Dual-constraint activation Quantization method. COD-TDQ addresses this token-local bottleneck with two coupled steps: Direct-Sum Token-Group (DSTG) assigns token-group scales to suppress cross-token range domination, and Dual-Constraint Range Projection (DCRP) projects each token-group clip range to keep the step-to-dispersion ratio and the zero-bin mass bounded. Across four COD benchmarks and two baseline models (CFRN and ESCNet), COD-TDQ consistently achieves an S{\alpha}score more than 0.12 higher than that of the state-of-the-art quantization method without retraining. The code will be released.