From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability

arXiv cs.AI / 4/16/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces a catalog-driven framework that converts natural-language questions into executable PromQL queries to reduce the query-authoring barrier for observability users.
  • It combines a statically curated catalog of ~2,000 metrics with runtime discovery of hardware- and GPU-vendor-specific signals to support cloud-native environments.
  • A multi-stage query pipeline classifies intent, routes metrics by category, and applies multi-dimensional semantic scoring to improve accuracy of the generated PromQL.
  • The framework includes dynamic temporal resolution that interprets varied natural-language time expressions and maps them to the correct PromQL duration syntax.
  • Integrated with the Model Context Protocol (MCP), the system enables tool-augmented LLM interactions across providers and was deployed on production Kubernetes clusters for AI inference workloads with ~1.1s end-to-end latency via the catalog path.

Abstract

Modern cloud-native platforms expose thousands of time series metrics through systems like Prometheus, yet formulating correct queries in domain-specific languages such as PromQL remains a significant barrier for platform engineers and site reliability teams. We present a catalog-driven framework that translates natural language questions into executable PromQL queries, bridging the gap between human intent and observability data. Our approach introduces three contributions: (1) a hybrid metrics catalog that combines a statically curated base of approximately 2,000 metrics with runtime discovery of hardware-specific signals across GPU vendors, (2) a multi-stage query pipeline with intent classification, category-aware metric routing, and multi-dimensional semantic scoring, and (3) a dynamic temporal resolution mechanism that interprets diverse natural language time expressions and maps them to appropriate PromQL duration syntax. We integrate the framework with the Model Context Protocol (MCP) to enable tool-augmented LLM interactions across multiple providers. The catalog-driven approach achieves sub-second metric discovery through pre-computed category indices, with the full pipeline completing in approximately 1.1 seconds via the catalog path. The system has been deployed on production Kubernetes clusters managing AI inference workloads, where it supports natural language querying across approximately 2,000 metrics spanning cluster health, GPU utilization, and model-serving performance.