Two-Stage Optimizer-Aware Online Data Selection for Large Language Models
arXiv cs.AI / 4/2/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that gradient-based data selection for LLM fine-tuning, while effective offline, is ill-suited for online fine-tuning where data arrives sequentially and utility depends on the current optimization step and geometry.
- It proposes an optimizer-aware online data selection framework that treats selection as shaping the next target-oriented parameter update under the current optimizer state.
- The method formulates online selection as an update-matching problem connected to second-order target utility, emphasizing that the chosen subset must account for interactions and redundancy among samples.
- To implement this for practical long-context LLMs, it introduces a two-stage Filter-then-Weight algorithm plus factorized outer-product gradient representations and optimized matrix computations.
- Experiments indicate consistent improvements in convergence and downstream task performance compared with existing online data selection baselines under the same data budget.
Related Articles
Benchmarking Batch Deep Reinforcement Learning Algorithms
Dev.to
Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse
Dev.to
How To Leverage AI for Back-Office Headcount Optimization
Dev.to
Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.
Reddit r/LocalLLaMA
SOTA Language Models Under 14B?
Reddit r/LocalLLaMA