Multimodal QUD: Inquisitive Questions from Scientific Figures
arXiv cs.CL / 4/28/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces “Multimodal QUD (MQUD),” which generates deeper, inquisitive questions grounded in both the figure and the surrounding paper context rather than relying on text-only cues.
- It extends the Questions Under Discussion (QUD) framework from text-only discourse to multimodal discourse, modeling how implicit questions arise and get resolved over reading.
- The authors release MQUD, a dataset of research papers where such implicit questions are made explicit and annotated by the original authors.
- Experiments show that fine-tuning a vision-language model (VLM) on MQUD improves its ability to produce content-specific, visually grounded multimodal questions that require higher-level reasoning.
Related Articles
LLMs will be a commodity
Reddit r/artificial
Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to
HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility
Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally
Reddit r/LocalLLaMA

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant
Dev.to