This may be naive but if we stripped a model of its image processing/voice processing capabilities, can it make it smaller or faster? Is that even possible? Does it vary between MoE and dense?
If it is, why isn't it done on popular models
[link] [comments]


