From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures

arXiv cs.LG / 4/9/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

Prime Video load tests can fail to capture behaviors specific to real live-event and VOD traffic, motivating an anomaly detection method focused on those differences.
The paper proposes an unsupervised graph-embedding approach (GCN-GAE) that learns node-level representations of directed, weighted service graphs at minute-level resolution and flags under-represented services via cosine similarity between load-test vs event embeddings.
Reported results indicate the system can identify incident-related services and has early-detection capability, with a synthetic anomaly injection framework showing high precision (96%) and low false positives (0.08%).
The study finds recall is still limited (58%) under conservative propagation assumptions, highlighting constraints in the current anomaly-propagation model.
Beyond the Prime Video deployment, the work provides methodological lessons and a baseline foundation for applying similar techniques across broader microservice ecosystems.

Abstract

Prime Video regularly conducts load tests to simulate the viewer traffic spikes seen during live events such as Thursday Night Football as well as video-on-demand (VOD) events such as Rings of Power. While these stress tests validate system capacity, they can sometimes miss service behaviors unique to real event traffic. We present a graph-based anomaly detection system that identifies under-represented services using unsupervised node-level graph embeddings. Built on a GCN-GAE, our approach learns structural representations from directed, weighted service graphs at minute-level resolution and flags anomalies based on cosine similarity between load test and event embeddings. The system identifies incident-related services that are documented and demonstrates early detection capability. We also introduce a preliminary synthetic anomaly injection framework for controlled evaluation that show promising precision (96%) and low false positive rate (0.08%), though recall (58%) remains limited under conservative propagation assumptions. This framework demonstrates practical utility within Prime Video while also surfacing methodological lessons and directions, providing a foundation for broader application across microservice ecosystems.

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

Moving from proof of concept to production: what we learned with Nometria

Dev.to

Frontend Engineers Are Becoming AI Trainers

Dev.to

From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures

Key Points

Abstract

Related Articles

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

Moving from proof of concept to production: what we learned with Nometria

Frontend Engineers Are Becoming AI Trainers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer