When Less Is More: Simplicity Beats Complexity for Physics-Constrained InSAR Phase Unwrapping

arXiv cs.CV / 5/5/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • Operational InSAR phase unwrapping is identified as the main computational bottleneck for volcanic and seismic monitoring, motivating improved architectures for the task.
  • The paper argues against the trend of using high-complexity computer-vision components like attention mechanisms without validating them for physics-constrained geophysical regression.
  • In a large-scale ablation study on the global LiCSAR benchmark (20 frames, 39,724 patches, 651M pixels), a vanilla U-Net (7.76M parameters) outperforms 11.37M-parameter attention-based models by 34% in R² and 51% in RMSE.
  • Physical analysis using power spectral density shows that attention models introduce unphysical high-frequency artifacts (>0.3 cycles/pixel), breaking the smoothness constraints expected from elastic surface deformation.
  • The simpler U-Net also meets operational requirements, achieving 2.92 ms inference latency (2.5× faster) and being the only viable option to comfortably stay under a 100 ms early-warning threshold; the authors provide code publicly.

Abstract

Operational phase unwrapping is the primary computational bottleneck in InSAR-based volcanic and seismic monitoring. We challenge the industry trend of adopting high-complexity computer vision architectures, such as attention mechanisms, without validating their suitability for physics-constrained geophysical regression. We present the first large-scale architectural ablation study on a global LiCSAR benchmark (20 frames, 39,724 patches, 651M pixels). Our results reveal a significant "complexity penalty": a vanilla U-Net (7.76M parameters) achieves R^2=0.834 and RMSE = 1.01 cm, outperforming 11.37M-parameter attention-based models by 34% in R^2 and 51% in RMSE. Power Spectral Density (PSD) analysis provides the physical justification: while attention excels at capturing sharp semantic edges in natural images, it injects unphysical high-frequency artifacts (>0.3 cycles/pixel) into geophysical fields, violating the fundamental smoothness constraints of elastic surface deformation. With a 2.92ms inference latency (a 2.5\times speedup), the vanilla U-Net is the only candidate to comfortably meet the sub-100ms requirement for operational early-warning systems. This work bridges the "publication-to-practice" gap by proving that convolutional locality outperforms modern complexity for smooth-field regression, advocating for physics-informed simplicity in ML4RS. Code available at https://github.com/prabhjotschugh/When-Less-is-More-InSAR-Phase-Unwrapping