MosaicMem: Hybrid Spatial Memory for Controllable Video World Models
arXiv cs.CV / 3/19/2026
📰 NewsModels & Research
Key Points
- MosaicMem introduces a hybrid spatial memory that lifts patches into 3D to improve localization and targeted retrieval while preserving the model's ability to follow prompts during generation.
- It uses a patch-and-compose interface to assemble spatially aligned patches in the queried view, preserving what should persist and allowing the model to inpaint what should evolve.
- The approach adds PRoPE camera conditioning and two memory-alignment methods, achieving better pose adherence than implicit memory and stronger dynamic modeling than explicit baselines.
- It enables minute-level navigation, memory-based scene editing, and autoregressive rollout, supporting long-horizon, memory-consistent video world modeling.
Related Articles
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA
Qwen3.5 Knowledge density and performance
Reddit r/LocalLLaMA
I think I made the best general use System Prompt for Qwen 3.5 (OpenWebUI + Web search)
Reddit r/LocalLLaMA