Crowdsourcing of Real-world Image Annotation via Visual Properties

arXiv cs.CV / 4/17/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The paper addresses the “semantic gap” in object recognition datasets by showing how visual data-to-language mappings can be complex and bias model performance in computer vision.
  • It proposes an image annotation approach that combines knowledge representation, natural language processing, and computer vision, using visual property constraints to reduce annotator subjectivity.
  • An interactive crowdsourcing framework is introduced that asks dynamically generated questions guided by a predefined object category hierarchy and real-time annotator feedback.
  • Experiments indicate that the proposed methodology is effective, and the authors analyze annotator feedback to further optimize the crowdsourcing setup.

Abstract

Recent advances in data-centric artificial intelligence highlight inherent limitations in object recognition datasets. One of the primary issues stems from the semantic gap problem, which results in complex many-to-many mappings between visual data and linguistic descriptions. This bias adversely affects performance in computer vision tasks. This paper proposes an image annotation methodology that integrates knowledge representation, natural language processing, and computer vision techniques, aiming to reduce annotator subjectivity by applying visual property constraints. We introduce an interactive crowdsourcing framework that dynamically asks questions based on a predefined object category hierarchy and annotator feedback, guiding image annotation by visual properties. Experiments demonstrate the effectiveness of this methodology, and annotator feedback is discussed to optimize the crowdsourcing setup.