AI Navigate

[P] Zero-code runtime visibility for PyTorch training

Reddit r/MachineLearning / 3/20/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • TraceML adds a zero-code mode that enables a live runtime view during PyTorch training via the command traceml watch train.py.
  • It displays a live terminal view of system and process metrics while stdout/stderr remains visible, enabling quick diagnostics without extra instrumentation.
  • The feature is aimed at fast feedback when a training run feels slow, serving as a first-pass check before adding heavier instrumentation or a full profiler.
  • A current limitation is that multi-node launches are not yet supported; the project repository is at https://github.com/traceopt-ai/traceml/.
[P] Zero-code runtime visibility for PyTorch training

https://preview.redd.it/kfjsajv7h7qg1.png?width=1862&format=png&auto=webp&s=373b5d81aa2bb3b7fcff2e09cab9c17cd73d9c20

I added a zero-code mode to TraceML (oss) :

traceml watch train.py 

It gives a live terminal view of system + process metrics during PyTorch training, with normal stdout/stderr still visible.

Built for the case where a run feels slow and you want a quick first-pass view before adding instrumentation or reaching for a heavier profiler.

Current limitation: not for multi-node launches yet.

Repo: https://github.com/traceopt-ai/traceml/

submitted by /u/traceml-ai
[link] [comments]