Instruction-Tuned LLMs for Parsing and Mining Unstructured Logs on Leadership HPC Systems
arXiv cs.AI / 4/8/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a domain-adapted, instruction-following LLM framework to parse and mine largely unstructured, heterogeneous system logs from leadership-class HPC systems.
- It fine-tunes an 8B-parameter LLaMA model on HPC log-template data using instruction-tuned examples and a hybrid fine-tuning strategy (including CoT-style reasoning) to achieve high-fidelity structure extraction.
- The method is designed for privacy-preserving, locally deployable, fast, and energy-efficient log mining rather than relying on external/cloud services.
- Experiments on LogHub datasets show parsing accuracy comparable to much larger models (e.g., LLaMA 70B and Claude), suggesting strong parameter efficiency.
- A real-world validation parses over 600M production logs from the Frontier supercomputer in four weeks, identifying temporal dynamics, node-level anomalies, and correlations between workload and error logs.




