STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices
arXiv cs.AI / 4/30/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces STLGT, a trace-based linear graph Transformer designed to forecast end-to-end p95 tail latency for microservice APIs to support proactive SLO management.
- STLGT represents service traces as span graphs and propagates cross-service dependencies with inference time scaling linearly with the graph size, addressing efficiency concerns at scale.
- A decoupled temporal module is used to capture non-stationary, bursty workload dynamics that make tail-latency prediction difficult.
- Experiments on DeathStarBench (personalized education microservices) and Alibaba traces show an average 8.5% improvement in MAPE over PERT-GNN, with up to 12× faster CPU inference at N=32 after preprocessing.
- Ablation results indicate that both the structure-aware linear graph Transformer and the temporal module are particularly effective under bursty traffic conditions.
Related Articles
Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]
Reddit r/MachineLearning

Agent Amnesia and the Case of Henry Molaison
Dev.to
Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry
Dev.to
Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance
Dev.to
Vibe coding is a tool, not a shortcut. Most people are using it wrong.
Dev.to