ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning
arXiv cs.LG / 3/12/2026
📰 NewsModels & Research
Key Points
- Mixture-of-LoRAs can suffer from imbalanced routing weights, causing only a few LoRAs to dominate and limiting expressivity.
- ReMix introduces non-learnable routing weights to keep all active LoRAs effective, preventing domination by a single LoRA.
- To train with non-learnable weights, ReMix uses an unbiased gradient estimator based on reinforce leave-one-out, treating the supervision loss as the reward.
- Extensive experiments show ReMix significantly outperforms state-of-the-art parameter-efficient fine-tuning methods with a comparable number of activated parameters.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA
acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan
Reddit r/LocalLLaMA

**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**
Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!
Reddit r/LocalLLaMA