From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures

arXiv cs.LG / 4/9/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Prime Video load tests can fail to capture behaviors specific to real live-event and VOD traffic, motivating an anomaly detection method focused on those differences.
  • The paper proposes an unsupervised graph-embedding approach (GCN-GAE) that learns node-level representations of directed, weighted service graphs at minute-level resolution and flags under-represented services via cosine similarity between load-test vs event embeddings.
  • Reported results indicate the system can identify incident-related services and has early-detection capability, with a synthetic anomaly injection framework showing high precision (96%) and low false positives (0.08%).
  • The study finds recall is still limited (58%) under conservative propagation assumptions, highlighting constraints in the current anomaly-propagation model.
  • Beyond the Prime Video deployment, the work provides methodological lessons and a baseline foundation for applying similar techniques across broader microservice ecosystems.

Abstract

Prime Video regularly conducts load tests to simulate the viewer traffic spikes seen during live events such as Thursday Night Football as well as video-on-demand (VOD) events such as Rings of Power. While these stress tests validate system capacity, they can sometimes miss service behaviors unique to real event traffic. We present a graph-based anomaly detection system that identifies under-represented services using unsupervised node-level graph embeddings. Built on a GCN-GAE, our approach learns structural representations from directed, weighted service graphs at minute-level resolution and flags anomalies based on cosine similarity between load test and event embeddings. The system identifies incident-related services that are documented and demonstrates early detection capability. We also introduce a preliminary synthetic anomaly injection framework for controlled evaluation that show promising precision (96%) and low false positive rate (0.08%), though recall (58%) remains limited under conservative propagation assumptions. This framework demonstrates practical utility within Prime Video while also surfacing methodological lessons and directions, providing a foundation for broader application across microservice ecosystems.