SAKE: Self-aware Knowledge Exploitation-Exploration for Grounded Multimodal Named Entity Recognition
arXiv cs.CL / 4/23/2026
💬 OpinionModels & Research
Key Points
- The paper targets Grounded Multimodal Named Entity Recognition (GMNER), which must identify named entities and localize their corresponding visual regions from image-text pairs in open-world social media settings.
- It argues that prior methods over-rely on either noisy heuristic retrieval (hurting precision on known entities) or internal LLM-based refinement (limited by model knowledge and prone to hallucinations).
- The proposed SAKE framework combines internal “knowledge exploitation” with external “knowledge exploration” using self-aware reasoning and adaptive invocation of search tools.
- SAKE is trained in two stages: difficulty-aware search tag generation to produce entity-level uncertainty signals, and SAKE-SeCoT supervised fine-tuning to teach self-awareness and tool use.
- Experiments on two social media benchmarks show SAKE improves performance by using agentic reinforcement learning with rewards that discourage unnecessary retrieval, enabling better decisions about when to search.
Related Articles

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to

GPT Image 2 vs DALL-E 3: What Actually Changed in OpenAI's New Image Model
Dev.to

AI Tutor for Science Students — Physics Chemistry Biology Solved by AI
Dev.to