[P] Zero-code runtime visibility for PyTorch training

Reddit r/MachineLearning / 3/20/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

TraceML adds a zero-code mode that enables a live runtime view during PyTorch training via the command traceml watch train.py.
It displays a live terminal view of system and process metrics while stdout/stderr remains visible, enabling quick diagnostics without extra instrumentation.
The feature is aimed at fast feedback when a training run feels slow, serving as a first-pass check before adding heavier instrumentation or a full profiler.
A current limitation is that multi-node launches are not yet supported; the project repository is at https://github.com/traceopt-ai/traceml/.

I added a zero-code mode to TraceML (oss) :

traceml watch train.py

It gives a live terminal view of system + process metrics during PyTorch training, with normal stdout/stderr still visible.

Built for the case where a run feels slow and you want a quick first-pass view before adding instrumentation or reaching for a heavier profiler.

Current limitation: not for multi-node launches yet.

Dev.to

Dev.to

Dev.to

Dev.to

Dev.to