Automatic Configuration of LLM Post-Training Pipelines
arXiv cs.LG / 3/20/2026
💬 OpinionTools & Practical UsageModels & Research
Key Points
- The paper introduces AutoPipe, a budget-aware two-stage framework to configure LLM post-training pipelines (supervised fine-tuning and reinforcement learning) under realistic compute budgets.
- It combines an offline dataset-conditioned learning-to-rank surrogate with online Bayesian optimization, using a Gaussian-process residual to tailor guidance to each dataset.
- To cut evaluation cost, each trial is early-stopped and scored by a learned predictor that maps early training signals to a low-cost proxy for final post-training performance.
- Experiments on biomedical reasoning tasks show AutoPipe outperforms offline-only baselines and achieves comparable performance to the strongest online HPO baselines while using less than 10% of their computational cost.