MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis
arXiv cs.LG / 3/26/2026
📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- MetaKube is introduced as an experience-aware LLM framework for Kubernetes failure diagnosis that learns from historical resolutions rather than relying only on static knowledge bases.
- The system combines an Episodic Pattern Memory Network (EPMN) for confidence-calibrated retrieval, a meta-cognitive controller that switches between intuitive and analytical reasoning, and KubeLLM (a locally deployable 8B model) post-trained on a 7,000-sample Kubernetes fault resolution dataset.
- In evaluations on 1,873 real-world scenarios, MetaKube improved Qwen3-8B scores from 50.9 to 90.5 and claims to approach GPT-4.1-like performance while preserving data privacy via local deployment.
- Experiments indicate the episodic experiential learning component contributes a 15.3% improvement, with continuous-learning tests showing progressively better results as it accumulates operational knowledge.
- The authors provide source code and resources publicly on GitHub for reuse and further experimentation.