STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices

arXiv cs.AI / 4/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces STLGT, a trace-based linear graph Transformer designed to forecast end-to-end p95 tail latency for microservice APIs to support proactive SLO management.
STLGT represents service traces as span graphs and propagates cross-service dependencies with inference time scaling linearly with the graph size, addressing efficiency concerns at scale.
A decoupled temporal module is used to capture non-stationary, bursty workload dynamics that make tail-latency prediction difficult.
Experiments on DeathStarBench (personalized education microservices) and Alibaba traces show an average 8.5% improvement in MAPE over PERT-GNN, with up to 12× faster CPU inference at N=32 after preprocessing.
Ablation results indicate that both the structure-aware linear graph Transformer and the temporal module are particularly effective under bursty traffic conditions.

Abstract

Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Reddit r/MachineLearning

Agent Amnesia and the Case of Henry Molaison

Dev.to

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Dev.to

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Dev.to

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

Dev.to

STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices

Key Points

Abstract

Related Articles

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Agent Amnesia and the Case of Henry Molaison

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer