Can We Build Scene Graphs, Not Classify Them? FlowSG: Progressive Image-Conditioned Scene Graph Generation with Flow Matching
arXiv cs.CV / 4/22/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- FlowSG reframes Scene Graph Generation (SGG) as a progressive, generative task using continuous-time flow matching, rather than treating it as a one-shot classification problem.
- The method uses a VQ-VAE to quantize scene-graph representations into discrete tokens, then employs a graph Transformer to jointly evolve bounding boxes and categorical tokens via a velocity field and flow-conditioned message passing.
- Training combines flow-matching losses for geometric refinement with discrete-flow objectives for object and predicate tokens, enabling few-step inference.
- Experiments on Visual Genome (VG) and PSG (with both closed- and open-vocabulary settings) report consistent improvements in predicate recall/mean recall and graph-level metrics, including an average ~3-point gain over USG-Par.
- FlowSG is designed to be plug-and-play with standard detectors and segmenters, suggesting practical integration potential for image-conditioned scene graph synthesis.



