Excellent discussion about LLM scaling [D]

Reddit r/MachineLearning / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The article shares an in-depth discussion of how memory and compute requirements scale for large language models (LLMs).
  • It argues that using LLMs locally or on private cloud can be inefficient, depending on how scaling behavior interacts with your workload.
  • A key takeaway is that inference becomes more efficient when using large batching strategies, enabled by favorable memory/compute scaling.
  • The piece recommends a linked discussion video featuring Reiner Pope about how GPT, Claude, and Gemini are trained and served.

I came across an excellent in depth discussion of memory and compute scaling analysis for LLMs. One takeaway is that running LLMs locally or on private cloud is wasteful. Memory / compute scaling makes large batching during inference very efficient.

Highly recommend. How GPT, Claude, and Gemini are actually trained and served with Reiner Pope

submitted by /u/geneing
[link] [comments]