OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs
arXiv cs.AI / 4/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- OmniTrace is proposed as a lightweight, model-agnostic framework to attribute which multimodal inputs (text, image, audio, video) support each statement generated by omni-modal, decoder-only LLMs during generation time.
- The method reframes attribution as a generation-time tracing problem over the causal decoding process, converting token-level attribution signals (e.g., attention or gradient-based scores) into coherent span-level, cross-modal explanations.
- It aggregates traced signals into semantically meaningful spans using confidence-weighted and temporally coherent strategies, enabling concise supporting source selection without retraining or additional supervision.
- Experiments on Qwen2.5-Omni and MiniCPM-o-4.5 for visual, audio, and video tasks show more stable and interpretable attribution than self-attribution and embedding-based baselines.
- Results also indicate robustness across multiple underlying token-level attribution signals, supporting the idea that structured generation-time tracing can scale multimodal transparency.
Related Articles

Introducing Claude Opus 4.7
Anthropic News

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to

Config-first code generator to replace repetitive AI boilerplate — looking for feedback and collaborators
Dev.to

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs
Dev.to