How do you experiment with a (very) large model architecture? [D]

Reddit r/MachineLearning / 5/5/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The article discusses how to run quick experiments to validate hypotheses when reproducing compute-intensive large model training, especially for a specific diffusion model paper.
It outlines common strategies such as using only a small fraction of the dataset (about 5–10%), reducing batch size while adjusting the learning rate, and cutting the number of training epochs/iterations.
The author questions whether additional or contradictory practices exist beyond what can be inferred from online resources and general knowledge from LLMs.
Overall, it focuses on experimental design and cost-saving methods for large-scale model training rather than reporting a new model or study result.

Im trying to reproduce a paper (a very particular kind of diffusion model), and their training regime is incredibly compute heavy.

In general, how are quick experiments performed to validate hypotheses when the models are large and compute is expensive?

Some cursory browsing yields the following: 1) Using only 5-10% of the entire dataset. 2) Drastically reducing the batch size and compensating for it in the learning rate 3) Reducing the number of epochs/iterations.

But I've had to infer these from resources online and what LLMs tell me. Is there anything in addition to/beyond/contradicting these?

submitted by /u/Aathishs04
[link] [comments]