ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning
arXiv cs.LG / 3/12/2026
📰 NewsModels & Research
Key Points
- Mixture-of-LoRAs can suffer from imbalanced routing weights, causing only a few LoRAs to dominate and limiting expressivity.
- ReMix introduces non-learnable routing weights to keep all active LoRAs effective, preventing domination by a single LoRA.
- To train with non-learnable weights, ReMix uses an unbiased gradient estimator based on reinforce leave-one-out, treating the supervision loss as the reward.
- Extensive experiments show ReMix significantly outperforms state-of-the-art parameter-efficient fine-tuning methods with a comparable number of activated parameters.
Related Articles

Chip Smuggling Arrests, OpenClaw Is 'The Next ChatGPT,' and 81K People on AI
Dev.to
The Lemma
Dev.to
Your Agent Will Eventually Do Something Catastrophic. Here's How to Prevent It.
Dev.to
[D] Modeling online discourse escalation as a state machine (dataset + labeling approach)
Reddit r/MachineLearning
[R] Is this paper Nonsense ? [DCdetector: Dual Attention Contrastive Representation Learning for Time Series Anomaly Detection]
Reddit r/MachineLearning