Ontology-Guided Diffusion for Zero-Shot Visual Sim2Real Transfer

arXiv cs.CV / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Ontology-Guided Diffusion (OGD), a neuro-symbolic zero-shot framework for sim2real image translation that represents realism as structured knowledge via an ontology and knowledge graph.
OGD decomposes realism into interpretable traits (e.g., lighting and material properties) and uses a graph neural network to produce a global embedding that conditions a pretrained diffusion model through cross-attention.
A symbolic planner translates ontology traits into a sequence of visual edits, enabling structured instruction prompts that guide the diffusion process toward reduced realism gap.
Across benchmarks, OGD better distinguishes real from synthetic images than baselines and achieves state-of-the-art performance in sim2real translation, demonstrating data efficiency and interpretability.
The work shows that explicitly encoding realism structure can enable generalizable zero-shot sim2real transfer with broader applicability to vision synthesis.

Abstract

Bridging the simulation-to-reality (sim2real) gap remains challenging as labelled real-world data is scarce. Existing diffusion-based approaches rely on unstructured prompts or statistical alignment, which do not capture the structured factors that make images look real. We introduce Ontology- Guided Diffusion (OGD), a neuro-symbolic zero-shot sim2real image translation framework that represents realism as structured knowledge. OGD decomposes realism into an ontology of interpretable traits -- such as lighting and material properties -- and encodes their relationships in a knowledge graph. From a synthetic image, OGD infers trait activations and uses a graph neural network to produce a global embedding. In parallel, a symbolic planner uses the ontology traits to compute a consistent sequence of visual edits needed to narrow the realism gap. The graph embedding conditions a pretrained instruction-guided diffusion model via cross-attention, while the planned edits are converted into a structured instruction prompt. Across benchmarks, our graph-based embeddings better distinguish real from synthetic imagery than baselines, and OGD outperforms state-of-the-art diffusion methods in sim2real image translations. Overall, OGD shows that explicitly encoding realism structure enables interpretable, data-efficient, and generalisable zero-shot sim2real transfer.

How AI is Transforming Dynamics 365 Business Central

Dev.to

Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm

Reddit r/artificial

Do I need different approaches for different types of business information errors?

Dev.to

ShieldCortex: What We Learned Protecting AI Agent Memory

Dev.to

How AI-Powered Revenue Intelligence Transforms B2B Sales Teams

Dev.to

Ontology-Guided Diffusion for Zero-Shot Visual Sim2Real Transfer

Key Points

Abstract

Related Articles

How AI is Transforming Dynamics 365 Business Central

Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm

Do I need different approaches for different types of business information errors?

ShieldCortex: What We Learned Protecting AI Agent Memory

How AI-Powered Revenue Intelligence Transforms B2B Sales Teams

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer