CAGE-SGG: Counterfactual Active Graph Evidence for Open-Vocabulary Scene Graph Generation
arXiv cs.CV / 4/27/2026
📰 NewsModels & Research
Key Points
- The paper addresses a key reliability problem in open-vocabulary scene graph generation: relation predictions can be biased by language priors or object co-occurrence rather than grounded visual evidence.
- It introduces CAGE-SGG, an evidence-rounded framework that verifies candidate relations using counterfactual relation verification instead of accepting language-plausible proposals directly.
- The method generates open-vocabulary relation candidates with a vision-language proposer, decomposes predicate phrases into soft evidence bases (e.g., support, contact, containment, depth, motion, state), and uses a relation-conditioned evidence encoder to extract predicate-relevant cues.
- A counterfactual verifier checks whether the relation score drops when necessary evidence is removed and stays stable under irrelevant perturbations, improving grounding reliability.
- Experiments across multiple SGG benchmarks show consistent gains in recall metrics, unseen-predicate generalization, and counterfactual grounding quality, arguing that “relation verification” is more reliable and interpretable than “relation generation.”
Related Articles

The five loops between AI coding and AI engineering
Dev.to

A Machine Learning Model for Stock Market Prediction
Dev.to

Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo
MarkTechPost
Three limitations I keep hitting with retrieval-augmented generation in production and I'm running out of ideas [D]
Reddit r/MachineLearning

Anthropic's magic code-sniffer: More Swiss cheese than cheddar, for now
The Register