STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices

arXiv cs.AI / 4/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces STLGT, a trace-based linear graph Transformer designed to forecast end-to-end p95 tail latency for microservice APIs to support proactive SLO management.
  • STLGT represents service traces as span graphs and propagates cross-service dependencies with inference time scaling linearly with the graph size, addressing efficiency concerns at scale.
  • A decoupled temporal module is used to capture non-stationary, bursty workload dynamics that make tail-latency prediction difficult.
  • Experiments on DeathStarBench (personalized education microservices) and Alibaba traces show an average 8.5% improvement in MAPE over PERT-GNN, with up to 12× faster CPU inference at N=32 after preprocessing.
  • Ablation results indicate that both the structure-aware linear graph Transformer and the temporal module are particularly effective under bursty traffic conditions.

Abstract

Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.