| A weights and biases graph showing gpu utilization So, I've been pretraining a deep learning model specifically the zipformer model. Now, I've optimized my configs a lot to ensure full gpu utilization. Using WebDataset to pack my datasets. Using the proper number of workers to load data etc. In Windows Task Manager it shows my GPU is at 100% util consistently but Wandb shows this? How to find bottlenecks and optimize for them? What can be potential issues? [link] [comments] |
[D] How to increase/optimize for gpu utilization while doing model training?
Reddit r/MachineLearning / 3/12/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage
Key Points
- The author is pretraining a Zipformer model and reports tuning configs to maximize GPU utilization, including using WebDataset to pack the dataset and setting an appropriate number of data-loading workers.
- They notice that Windows Task Manager shows 100% GPU utilization while WandB indicates something else, and they are asking how to identify bottlenecks and further optimize performance.
- A GitHub link is provided to the Icefall repository's Zipformer training script for reference.
- The post invites discussion on potential issues causing suboptimal utilization and on methods to diagnose bottlenecks across data loading, I/O, and computation.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents
Dev.to

Perplexity Hub
Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to