AI Navigate

FMS$^2$: Unified Flow Matching for Segmentation and Synthesis of Thin Structures

arXiv cs.CV / 3/17/2026

📰 NewsModels & Research

Key Points

  • FMS^2 introduces a two-module flow-matching framework for segmentation and synthesis of thin structures, with SegFlow and SynFlow components.
  • SegFlow recasts segmentation as continuous image-to-mask transport using a time-indexed velocity field and ODE integration, offering trajectory-level supervision without topology heads or heavy loss engineering.
  • SynFlow is a mask-conditioned generator that creates pixel-aligned image-mask pairs and enables controllable geometric variations like sparsity, width, and branching to reduce domain shift.
  • On five crack and vessel benchmarks, SegFlow alone raises mean IoU from 0.511 to 0.599 (+17.2%) and lowers Betti error from 82.145 to 51.524 (-37.3%).
  • With limited labels, SynFlow-generated pairs recover near-full performance using only 25% of real annotations and improve cross-domain IoU by about 0.11 on average, highlighting effective domain generalization.

Abstract

Segmenting thin structures like infrastructure cracks and anatomical vessels is a task hampered by topology-sensitive geometry, high annotation costs, and poor generalization across domains. Existing methods address these challenges in isolation. We propose FMS^2, a flow-matching framework with two modules. (1) SegFlow is a 2.96M-parameter segmentation model built on a standard encoder-decoder backbone that recasts prediction as continuous image \rightarrow mask transport. It learns a time-indexed velocity field with a flow-matching regression loss and outputs the mask via ODE integration, rather than supervising only end-state logits. This trajectory-level supervision improves thin-structure continuity and sharpness, compared with tuned topology-aware loss baselines, without auxiliary topology heads, post-processing, or multi-term loss engineering. (2) SynFlow is a mask-conditioned mask \rightarrow image generator that produces pixel-aligned synthetic image-mask pairs. It injects mask geometry at multiple scales and emphasizes boundary bands via edge-aware gating, while a controllable mask generator expands sparsity, width, and branching regimes. On five crack and vessel benchmarks, SegFlow alone outperforms strong CNN, Transformer, Mamba, and generative baselines, improving the volumetric metric (mean IoU) from 0.511 to 0.599 (+17.2%) and reducing the topological metric (Betti matching error) from 82.145 to 51.524 (-37.3%). When training with limited labels, augmenting SegFlow with SynFlow-generated pairs recovers near-full performance using 25% of real annotations and improves cross-domain IoU by 0.11 on average. Unlike classical data augmentation that promotes invariance via label-preserving transforms, SynFlow provides pixel-aligned paired supervision with controllable structural shifts (e.g., sparsity, width, branching), which is particularly effective under domain shift.