FTPFusion: Frequency-Aware Infrared and Visible Video Fusion with Temporal Perturbation
arXiv cs.CV / 4/3/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- FTPFusion is a frequency-aware method for fusing infrared and visible videos that aims to improve both spatial detail and temporal stability, which are often in tension in existing approaches.
- The model splits features into high-frequency and low-frequency components, using sparse cross-modal spatio-temporal interaction for high-frequency motion/complementary details and a temporal perturbation strategy for robustness to flicker, jitter, and misalignment.
- FTPFusion introduces an offset-aware temporal consistency constraint to explicitly stabilize cross-frame representations when temporal disturbances occur.
- Experiments on multiple public benchmarks show FTPFusion outperforming state-of-the-art fusion methods on metrics covering spatial fidelity and temporal consistency.
- The authors state that the source code will be released on GitHub, enabling further replication and downstream research use.
Related Articles

Black Hat Asia
AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story
Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure
Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts
MarkTechPost

Portable eye scanner powered by AI expands access to low-cost community screening
Reddit r/artificial