Hierarchy-Guided Topology Latent Flow for Molecular Graph Generation

arXiv cs.LG / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Hierarchy-Guided Latent Topology Flow (HLTF), a planner–executor approach that explicitly generates molecular bond graphs together with 3D coordinates to better control topology feasibility.
  • HLTF uses a latent multi-scale planning mechanism for global context and a constraint-aware sampler to suppress common failure modes like valence violations, disconnections, and implausible ring structures.
  • On QM9, HLTF reports 98.8% atom stability and 92.9% valid-and-unique, improving PoseBusters validity to 94.0% (about +0.9 over the strongest reported baseline).
  • On GEOM-DRUGS, HLTF achieves 85.5% validity and 85.0% valid-unique-novel without post-processing, and 92.2%/91.2% after standardized relaxation, closely matching the best post-processed baseline.
  • The authors argue that explicitly generating topology reduces “false-valid” molecules that pass RDKit sanitization but fail stricter chemical validity checks.

Abstract

Generating chemically valid 3D molecules is hindered by discrete bond topology: small local bond errors can cause global failures (valence violations, disconnections, implausible rings), especially for drug-like molecules with long-range constraints. Many unconditional 3D generators emphasize coordinates and then infer bonds or rely on post-processing, leaving topology feasibility weakly controlled. We propose Hierarchy-Guided Latent Topology Flow (HLTF), a planner-executor model that generates bond graphs with 3D coordinates, using a latent multi-scale plan for global context and a constraint-aware sampler to suppress topology-driven failures. On QM9, HLTF achieves 98.8% atom stability and 92.9% valid-and-unique, improving PoseBusters validity to 94.0% (+0.9 over the strongest reported baseline). On GEOM-DRUGS, HLTF attains 85.5%/85.0% validity/valid-unique-novel without post-processing and 92.2%/91.2% after standardized relaxation, within 0.9 points of the best post-processed baseline. Explicit topology generation also reduces "false-valid" samples that pass RDKit sanitization but fail stricter checks.