View-oriented Conversation Compiler for Agent Trace Analysis

arXiv cs.AI / 4/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that agent trace analysis quality often degrades when complex, nested agent conversations are fed to reflectors in plain or loosely structured formats like text/JSON/YAML/grep outputs.
It introduces VCC (View-oriented Conversation Compiler), which lexes/parses agent JSONL logs and emits multiple structured “views” including a lossless full transcript view, a user-perceived UI view, and an adaptive projection view driven by a relevance predicate.
In context-learning experiments on AppWorld, switching only the reflector’s input from raw JSONL to VCC-compiled views improves pass rates across all tested model configurations.
The approach also reduces reflector token usage by about half to two-thirds and yields more concise learned memory, indicating message formatting as key infrastructure for context learning.
Overall, the results suggest that conversation/trace message layout and view compilation materially affect downstream analytic and learning performance, beyond being a mere engineering detail.

Abstract

Agent traces carry increasing analytical value in the era of context learning and harness-driven agentic cognition, yet most prior work treats conversation format as a trivial engineering detail. Modern agent conversations contain deeply structured content, including nested tool calls and results, chain-of-thought reasoning blocks, sub-agent invocations, context-window compaction boundaries, and harness-injected system directives, whose complexity far exceeds that of simple user-assistant exchanges. Feeding such traces to a reflector or other analytical mechanism in plain text, JSON, YAML, or via grep can materially degrade analysis quality. This paper presents VCC (View-oriented Conversation Compiler), a compiler (lex, parse, IR, lower, emit) that transforms raw agent JSONL logs into a family of structured views: a full view (lossless transcript serving as the canonical line-number coordinate system), a user-interface view (reconstructing the interaction as the user actually perceived it), and an adaptive view (a structure-preserving projection governed by a relevance predicate). In a context-learning experiment on AppWorld, replacing only the reflector's input format, from raw JSONL to VCC-compiled views, leads to higher pass rates across all three model configurations tested, while cutting reflector token consumption by half to two-thirds and producing more concise learned memory. These results suggest that message format functions as infrastructure for context learning, not as an incidental implementation choice.