Founder effects shape the evolutionary dynamics of multimodality in open LLM families
arXiv cs.AI / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The study examines how multimodal (vision-language) capabilities emerge over time in open LLM families using Hugging Face ModelBiome lineage and metadata from more than 1.8M model entries.
- Multimodal cross-modal work exists widely in the broader ecosystem before it becomes common within major open LLM families, remaining rare through 2023 and most of 2024 before rising sharply in 2024–2025.
- Across families, vision-language model (VLM) variants typically debut months after first text-generation releases, with observed lags ranging from about 1 month (Gemma) to over a year for several families and ~26 months for GLM.
- Lineage analysis finds weak transfer from text-generation parents to VLM descendants (only 0.218% of fine-tuning edges from text parents lead to VLMs), while most multimodal expansion occurs within existing VLM lineages (94.5% of VLM-child edges originate from VLM parents).
- Many VLM releases appear as “new roots” without recorded parents (~60%), and founder concentration patterns suggest punctuated adoption: rare founder events seed multimodality, followed by rapid within-lineage amplification and diversification.
Related Articles
The Complete Guide to Model Context Protocol (MCP): Building AI-Native Applications in 2026
Dev.to
AI Agent Skill Security Report — 2026-03-25
Dev.to

Origin raises $30M Series A+ to improve global benefits efficiency
Tech.eu
AI Shields Your Money: Banks’ New Fraud Fighters
Dev.to
Building AI Phone Systems for Veterinary Clinics — What Actually Works
Dev.to