HYVE: Hybrid Views for LLM Context Engineering over Machine Data

arXiv cs.AI / 4/8/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces HYVE, a framework for LLM context engineering aimed at handling long, nested, repetitive machine-data payloads (e.g., logs/telemetry with JSON or AST-like structure).
  • HYVE uses a request-scoped datastore with schema information and performs preprocessing to detect repetitive structure, create hybrid column/row views, and expose only the most relevant representation to the LLM.
  • It provides postprocessing options including direct output return, datastore-backed recovery of omitted information, or a bounded additional LLM call for SQL-augmented semantic synthesis.
  • Evaluations across knowledge QA, chart generation, anomaly detection, and network troubleshooting show major efficiency gains (50–90% token reduction) and task improvements, including up to 132% better chart accuracy and up to 83% lower latency.

Abstract

Machine data is central to observability and diagnosis in modern computing systems, appearing in logs, metrics, telemetry traces, and configuration snapshots. When provided to large language models (LLMs), this data typically arrives as a mixture of natural language and structured payloads such as JSON or Python/AST literals. Yet LLMs remain brittle on such inputs, particularly when they are long, deeply nested, and dominated by repetitive structure. We present HYVE (HYbrid ViEw), a framework for LLM context engineering for inputs containing large machine-data payloads, inspired by database management principles. HYVE surrounds model invocation with coordinated preprocessing and postprocessing, centered on a request-scoped datastore augmented with schema information. During preprocessing, HYVE detects repetitive structure in raw inputs, materializes it in the datastore, transforms it into hybrid columnar and row-oriented views, and selectively exposes only the most relevant representation to the LLM. During postprocessing, HYVE either returns the model output directly, queries the datastore to recover omitted information, or performs a bounded additional LLM call for SQL-augmented semantic synthesis. We evaluate HYVE on diverse real-world workloads spanning knowledge QA, chart generation, anomaly detection, and multi-step network troubleshooting. Across these benchmarks, HYVE reduces token usage by 50-90% while maintaining or improving output quality. On structured generation tasks, it improves chart-generation accuracy by up to 132% and reduces latency by up to 83%. Overall, HYVE offers a practical approximation to an effectively unbounded context window for prompts dominated by large machine-data payloads.