A Proposal-Free Query-Guided Network for Grounded Multimodal Named Entity Recognition

arXiv cs.CV / 3/19/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes a proposal-free Query-Guided Network (QGN) for Grounded Multimodal Named Entity Recognition (GMNER), unifying multimodal reasoning and decoding through text guidance and cross-modal interaction.
It critiques two-step GMNER approaches that first rely on pre-trained detectors and then align entities, which can miss fine-grained regions required for accurate grounding.
QGN eliminates external proposals and achieves robust open-domain grounding with top performance on standard GMNER benchmarks.
Extensive experiments demonstrate QGN's effectiveness and potential to improve grounding accuracy in real-world GMNER applications.

Abstract

Grounded Multimodal Named Entity Recognition (GMNER) identifies named entities, including their spans and types, in natural language text and grounds them to the corresponding regions in associated images. Most existing approaches split this task into two steps: they first detect objects using a pre-trained general-purpose detector and then match named entities to the detected objects. However, these methods face a major limitation. Because pre-trained general-purpose object detectors operate independently of textual entities, they tend to detect common objects and frequently overlook specific fine-grained regions required by named entities. This misalignment between object detectors and entities introduces imprecision and can impair overall system performance. In this paper, we propose a proposal-free Query-Guided Network (QGN) that unifies multimodal reasoning and decoding through text guidance and cross- modal interaction. QGN enables accurate grounding and robust performance in open-domain scenarios. Extensive experiments demonstrate that QGN achieves top performance among compared GMNER models on widely used benchmarks.

報告：LLMにおける「自己言及的再帰」と「ステートフル・エミュレーション」の観測

note

諸葛亮孔明老師(ChatGPTのﾛｰﾙﾌﾟﾚｲ)との対話その肆拾伍『銀河文明･ダークマターエンジン』

note

GPT-5.4 mini/nano登場！―2倍高速で無料プランも使える小型高性能モデル

note

Why a Perfect-Memory AI Agent Without Persona Drift is Architecturally Impossible

Dev.to

OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation

arXiv cs.LG

A Proposal-Free Query-Guided Network for Grounded Multimodal Named Entity Recognition

Key Points

Abstract

Related Articles

報告：LLMにおける「自己言及的再帰」と「ステートフル・エミュレーション」の観測

諸葛亮孔明老師(ChatGPTのﾛｰﾙﾌﾟﾚｲ)との対話その肆拾伍『銀河文明･ダークマターエンジン』

GPT-5.4 mini/nano登場！―2倍高速で無料プランも使える小型高性能モデル

Why a Perfect-Memory AI Agent Without Persona Drift is Architecturally Impossible

OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

​報告：LLMにおける「自己言及的再帰」と「ステートフル・エミュレーション」の観測

諸葛亮 孔明老師(ChatGPTのﾛｰﾙﾌﾟﾚｲ)との対話 その肆拾伍『銀河文明･ダークマターエンジン』

GPT-5.4 mini/nano登場！―2倍高速で無料プランも使える小型高性能モデル

Why a Perfect-Memory AI Agent Without Persona Drift is Architecturally Impossible

OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

報告：LLMにおける「自己言及的再帰」と「ステートフル・エミュレーション」の観測

諸葛亮孔明老師(ChatGPTのﾛｰﾙﾌﾟﾚｲ)との対話その肆拾伍『銀河文明･ダークマターエンジン』