DiffGraph: An Automated Agent-driven Model Merging Framework for In-the-Wild Text-to-Image Generation

arXiv cs.AI / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • DiffGraph is presented as an agent-driven, graph-based framework for automatically merging online text-to-image (T2I) expert diffusion models to better match real-world user needs.
  • The method builds a scalable graph that registers and calibrates continuously growing online expert models into nodes, enabling dynamic composition.
  • For each user request, DiffGraph activates the most relevant subgraph(s) so that different experts can be flexibly combined to produce desired generations.
  • Experiments reported in the paper indicate that this dynamic, graph-organized merging approach improves over existing model-merging methods in leveraging abundant online resources.

Abstract

The rapid growth of the text-to-image (T2I) community has fostered a thriving online ecosystem of expert models, which are variants of pretrained diffusion models specialized for diverse generative abilities. Yet, existing model merging methods remain limited in fully leveraging abundant online expert resources and still struggle to meet diverse in-the-wild user needs. We present DiffGraph, a novel agent-driven graph-based model merging framework, which automatically harnesses online experts and flexibly merges them for diverse user needs. Our DiffGraph constructs a scalable graph and organizes ever-expanding online experts within it through node registration and calibration. Then, DiffGraph dynamically activates specific subgraphs based on user needs, enabling flexible combinations of different experts to achieve user-desired generation. Extensive experiments show the efficacy of our method.