Ego-Grounding for Personalized Question-Answering in Egocentric Videos
arXiv cs.CV / 4/3/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper provides a first systematic evaluation of multimodal LLMs for personalized question answering in egocentric (camera-wearer) videos, focusing on the ability to perform “ego-grounding.”
- It introduces MyEgo, a new egocentric VideoQA dataset with 541 long videos and about 5K personalized questions about “my things,” “my activities,” and “my past,” along with an analysis benchmarked across multiple MLLM variants.
- Results show that even top closed- and open-source MLLMs (e.g., GPT-5 and Qwen3-VL) perform poorly on MyEgo, reaching roughly ~46% (closed) and ~36% (open) accuracy and falling far behind human performance.
- The study finds that explicit reasoning and larger model scaling do not consistently improve performance, while providing relevant evidence helps but improvements diminish over time, suggesting weaknesses in tracking and long-range memory about the “me” identity and past context.
- The authors conclude that ego-grounding and long-range memory are key missing capabilities for personalized egocentric assistance and release the data/code to spur further research.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story
Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure
Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts
MarkTechPost

Portable eye scanner powered by AI expands access to low-cost community screening
Reddit r/artificial