Layout-Guided Controllable Pathology Image Generation with In-Context Diffusion Transformers
arXiv cs.CV / 3/17/2026
📰 NewsModels & Research
Key Points
- The work addresses controllable pathology image synthesis, noting that prior text-guided diffusion models offer coarse global control and lack fine-grained structural constraints.
- It introduces a scalable multi-agent LVLM annotation framework that combines image description, diagnostic step extraction, and automatic quality judgment to produce clinically aligned supervision at scale.
- It presents IC-DiT, a layout-aware diffusion transformer that fuses spatial layouts, textual descriptions, and visual embeddings with hierarchical multimodal attention to preserve morphology while maintaining global semantic coherence.
- Experiments on five histopathology datasets show IC-DiT achieves higher fidelity, stronger spatial controllability, and better diagnostic consistency, with generated images also boosting downstream tasks like cancer classification and survival analysis.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA
acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan
Reddit r/LocalLLaMA
**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**
Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!
Reddit r/LocalLLaMA