RayFormer: Modeling Inter- and Intra-Ray Similarity for NeRF-Based Video Snapshot Compressive Imaging

arXiv cs.CV / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper targets video snapshot compressive imaging (SCI), aiming to reconstruct dynamic scenes from a single measurement, and notes that NeRF-based approaches often use random ray sampling that overlooks structural similarities.
It introduces a patch-level ray sampling strategy to better model scene content structure before applying any learning-based reconstruction.
The authors propose RayFormer, an Inter- and Intra-Ray Transformer that captures both inter-ray similarities among neighboring spatial points at the same depth and intra-ray correlations along each viewing ray.
By combining patch-level sampling with a total variation prior in the optimization objective, the method improves spatial smoothness and reduces artifacts.
Experiments on simulated and real-world scenes show the approach achieves state-of-the-the-art (SOTA) reconstruction quality.

Abstract

Video snapshot compressive imaging (SCI) enables the reconstruction of dynamic scenes from a single snapshot measurement. Recently, NeRF-based methods have shown promising reconstruction performance. However, such methods typically adopt random ray sampling strategies and fail to capture content structural similarities, resulting in limited reconstruction quality. To address these issues, we first propose a patch-level ray sampling strategy to enable the modeling of content structure. Then, we propose an Inter- and Intra-Ray Transformer (RayFormer) to capture the structural similarities, modeling both inter-ray similarities among spatially neighboring points at the same depth and intra-ray correlations between adjacent points along the viewing ray. Finally, benefiting from the patch-level sampling strategy, the total variation prior is incorporated into the objective function to enhance spatial smoothness and suppress artifacts. Experiments in both simulated and real-world scenes demonstrate that the proposed method achieves state-of-the-art (SOTA) reconstruction performance.

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Reddit r/artificial

Automating FDA Compliance: AI for Specialty Food Producers

Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

THE DECODER

I hate this group but not literally

Reddit r/LocalLLaMA

RayFormer: Modeling Inter- and Intra-Ray Similarity for NeRF-Based Video Snapshot Compressive Imaging

Key Points

Abstract

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Automating FDA Compliance: AI for Specialty Food Producers

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

I hate this group but not literally

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer