Tula: Optimizing Time, Cost, and Generalization in Distributed Large-Batch Training
arXiv cs.LG / 3/20/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- Tula is an online service that automatically optimizes training time, cost, and convergence quality for large-batch distributed training of convolutional models using a combination of parallel-systems modeling and statistical performance prediction.
- It predicts training time and cost with 7.5-14% error across multiple models, enabling identification of the optimal batch-size for given resources and data.
- It achieves up to 20x speedup and about 9% average improvement in test accuracy over standard large-batch training on various vision tasks, addressing the generalization gap.
- The method mitigates the knee-point in the time/cost vs batch-size Pareto curve caused by communication overhead and memory limits, rather than simply increasing batch-size.
- By optimizing batch-size automatically, Tula reduces training costs and speeds up experimentation, informing infrastructure and scheduling decisions for distributed ML workloads.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".
Dev.to

Lessons from Academic Plagiarism Tools for SaaS Product Development
Dev.to

Windsurf’s New Pricing Explained: Simpler AI Coding or Hidden Trade-Offs?
Dev.to

Building Production RAG Systems with PostgreSQL: Complete Implementation Guide
Dev.to