AI Navigate

Vision-Language Based Expert Reporting for Painting Authentication and Defect Detection

arXiv cs.CV / 3/17/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The paper presents a fully automated vision-language model (VLM) that operates without human intervention during inference, combining multi-modal pulsed active infrared thermography (AIRT) analysis with structured natural-language reporting for painting authentication and defect detection.
  • It processes thermal sequences using Principal Component Thermography (PCT), Thermographic Signal Reconstruction (TSR), and Pulsed Phase Thermography (PPT), fusing anomaly masks into a consensus segmentation that guides the VLM’s reporting.
  • The VLM generates reports describing anomaly location, thermal behavior, plausible physical interpretations, and explicitly notes uncertainty to enable explainable conservation decisions.
  • Evaluations on two marquetries show consistent anomaly detection and stable, generalizable interpretations, indicating reproducibility across samples and potential for standardized documentation in cultural heritage contexts.

Abstract

Authenticity and condition assessment are central to conservation decision-making, yet interpretation and reporting of thermographic output remain largely bespoke and expert-dependent, complicating comparison across collections and limiting systematic integration into conservation documentation. Pulsed Active Infrared Thermography (AIRT) is sensitive to subsurface features such as material heterogeneity, voids, and past interventions; however, its broader adoption is constrained by artifact misinterpretation, inter-laboratory variability, and the absence of standardized, explainable reporting frameworks. Although multi-modal thermographic processing techniques are established, their integration with structured natural-language interpretation has not been explored in cultural heritage. A fully automated thermography-vision-language model (VLM) framework is presented. It combines multi-modal AIRT analysis with modality-aware textual reporting, without human intervention during inference. Thermal sequences are processed using Principal Component Thermography (PCT), Thermographic Signal Reconstruction (TSR), and Pulsed Phase Thermography (PPT), and the resulting anomaly masks are fused into a consensus segmentation that emphasizes regions supported by multiple thermal indicators while mitigating boundary artifacts. The fused evidence is provided to a VLM, which generates structured reports describing the location of the anomaly, thermal behavior, and plausible physical interpretations while explicitly acknowledging the uncertainty and diagnostic limitations. Evaluation on two marquetries demonstrates consistent anomaly detection and stable structured interpretations, indicating reproducibility and generalizability across samples.