Strip Qwen3.6 dense of its multimodal capabilities

Reddit r/LocalLLaMA / 4/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep Analysis

Key Points

  • The post raises a technical question about whether removing a model’s multimodal components (e.g., image or voice processing) could reduce model size or improve inference speed.
  • It asks whether such changes are feasible in practice and whether the outcome differs between Mixture-of-Experts (MoE) architectures and dense models.
  • The author questions why this kind of “stripping” optimization is not already applied to widely used models.
  • The discussion is framed as a speculative, exploratory inquiry rather than reporting an experimental result or a concrete release.

This may be naive but if we stripped a model of its image processing/voice processing capabilities, can it make it smaller or faster? Is that even possible? Does it vary between MoE and dense?

If it is, why isn't it done on popular models

submitted by /u/redblood252
[link] [comments]