MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
arXiv cs.CL / 5/1/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- MiniCPM-o 4.5 is presented as a new multimodal LLM aiming for more human-like interaction by enabling real-time full-duplex, omni-modal communication rather than alternating turn-based phases.
- The article identifies two main bottlenecks in current multimodal systems—lack of timely input integration during generation and largely reactive behavior—and positions the new model as addressing both.
- The core technical contribution is Omni-Flow, a streaming framework that aligns multimodal inputs and outputs on a shared temporal axis to support simultaneous perception and response.
- The 9B-parameter model is reported to be competitive with larger systems in vision-language performance, surpass certain models in omni-modal understanding, and improve speech generation while boosting computation efficiency.
- The model is claimed to run in real-time full-duplex omni-modal interaction on edge devices with under 12GB RAM, enabled by efficient architecture design and inference optimization.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to

Automating FDA Compliance: AI for Specialty Food Producers
Dev.to