ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models
arXiv stat.ML / 4/21/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces ConMeZO, a derivative-free (zeroth-order) optimization method for fine-tuning large language models that avoids backpropagation memory overhead.
- ConMeZO speeds up convergence by adaptively sampling descent directions within a cone around a momentum-based estimate instead of using uniform random directions in the high-dimensional parameter space.
- The authors provide theoretical analysis showing ConMeZO matches MeZO’s worst-case convergence rate.
- Experiments on natural-language fine-tuning tasks indicate ConMeZO can be up to 2× faster than MeZO while maintaining the low-memory benefits of zeroth-order training.
Related Articles

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA