Parameter-Efficient Architectural Modifications for Translation-Invariant CNNs
arXiv cs.CV / 5/1/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that standard CNNs are not truly translation-invariant because spatially dependent fully connected layers make them vulnerable to even single-pixel shifts.
- It proposes a lightweight “Online Architecture” method that inserts Global Average Pooling (GAP) layers at multiple depths to decouple recognition from spatial location.
- In a VGG-16 case study, the modification cuts trainable parameters by 98% (5.2M → 82K) and total network size by 90% (138M → 14M) while maintaining competitive ImageNet Top-1 accuracy (66.4%).
- The approach improves translational robustness by reducing average relative loss (0.09 → 0.05), though the paper notes a remaining limitation from periodic aliasing introduced by discrete pooling.
- The authors extend the invariant CNNs to perceptual image quality assessment (LPIPS), showing stronger generalization (KADID-10k Spearman 0.89 vs. 0.75) and better alignment with human responses (RAID Spearman 0.95) than a retrained baseline.
Related Articles
Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...
Dev.to
I deployed AI agents across AWS, GCP, and Azure without a VPN. Here is how it works.
Dev.to
Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia
Dev.to
AI made learning fun again
Dev.to
MCP, Skills, AI Agents, and New Models: The New Stack for Software Development
Dev.to