MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis

arXiv cs.LG / 3/26/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

MetaKube is introduced as an experience-aware LLM framework for Kubernetes failure diagnosis that learns from historical resolutions rather than relying only on static knowledge bases.
The system combines an Episodic Pattern Memory Network (EPMN) for confidence-calibrated retrieval, a meta-cognitive controller that switches between intuitive and analytical reasoning, and KubeLLM (a locally deployable 8B model) post-trained on a 7,000-sample Kubernetes fault resolution dataset.
In evaluations on 1,873 real-world scenarios, MetaKube improved Qwen3-8B scores from 50.9 to 90.5 and claims to approach GPT-4.1-like performance while preserving data privacy via local deployment.
Experiments indicate the episodic experiential learning component contributes a 15.3% improvement, with continuous-learning tests showing progressively better results as it accumulates operational knowledge.
The authors provide source code and resources publicly on GitHub for reuse and further experimentation.

Abstract

Existing LLM-based Kubernetes diagnostic systems cannot learn from operational experience, operating on static knowledge bases without improving from past resolutions. We present MetaKube, an experience-aware LLM framework through three synergistic innovations: (1) an Episodic Pattern Memory Network (EPMN) that abstracts diagnostic patterns from historical resolutions and provides confidence-calibrated retrieval for both rapid pattern matching and guided causal exploration, (2) a meta-cognitive controller that dynamically routes between intuitive and analytical pathways based on problem familiarity, optimizing the trade-off between speed and depth, and (3) KubeLLM, a locally-deployable 8B model enhanced through domain-specific post-training on our 7,000-sample Kubernetes Fault Resolution Dataset. Evaluation on 1,873 real-world scenarios demonstrates MetaKube transforms Qwen3-8B from 50.9 to 90.5 points, approaching GPT-4.1 performance while ensuring complete data privacy. EPMN contributes 15.3% improvement through experiential learning, with continuous learning experiments showing progressive gains as the system accumulates operational knowledge. The source code and related resources are available at https://github.com/MetaKube-LLM-for-Kubernetes-Diagnosis/MetaKube.

AgentDesk vs Hiring Another Consultant: A Cost Comparison

Dev.to

v0.18.3

Ollama Releases

"Why Your AI Agent Needs a System 1"

Dev.to

When should we expect TurboQuant?

Reddit r/LocalLLaMA

ChatterMate vs Chatwoot vs Typebot: Which Open-Source Chat Platform Is Right for You?

Dev.to

MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis

Key Points

Abstract

Related Articles

AgentDesk vs Hiring Another Consultant: A Cost Comparison

v0.18.3

"Why Your AI Agent Needs a System 1"

When should we expect TurboQuant?

ChatterMate vs Chatwoot vs Typebot: Which Open-Source Chat Platform Is Right for You?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer