Vision-Language Based Expert Reporting for Painting Authentication and Defect Detection

arXiv cs.CV / 3/17/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The paper presents a fully automated vision-language model (VLM) that operates without human intervention during inference, combining multi-modal pulsed active infrared thermography (AIRT) analysis with structured natural-language reporting for painting authentication and defect detection.
It processes thermal sequences using Principal Component Thermography (PCT), Thermographic Signal Reconstruction (TSR), and Pulsed Phase Thermography (PPT), fusing anomaly masks into a consensus segmentation that guides the VLM’s reporting.
The VLM generates reports describing anomaly location, thermal behavior, plausible physical interpretations, and explicitly notes uncertainty to enable explainable conservation decisions.
Evaluations on two marquetries show consistent anomaly detection and stable, generalizable interpretations, indicating reproducibility across samples and potential for standardized documentation in cultural heritage contexts.

Abstract

Authenticity and condition assessment are central to conservation decision-making, yet interpretation and reporting of thermographic output remain largely bespoke and expert-dependent, complicating comparison across collections and limiting systematic integration into conservation documentation. Pulsed Active Infrared Thermography (AIRT) is sensitive to subsurface features such as material heterogeneity, voids, and past interventions; however, its broader adoption is constrained by artifact misinterpretation, inter-laboratory variability, and the absence of standardized, explainable reporting frameworks. Although multi-modal thermographic processing techniques are established, their integration with structured natural-language interpretation has not been explored in cultural heritage. A fully automated thermography-vision-language model (VLM) framework is presented. It combines multi-modal AIRT analysis with modality-aware textual reporting, without human intervention during inference. Thermal sequences are processed using Principal Component Thermography (PCT), Thermographic Signal Reconstruction (TSR), and Pulsed Phase Thermography (PPT), and the resulting anomaly masks are fused into a consensus segmentation that emphasizes regions supported by multiple thermal indicators while mitigating boundary artifacts. The fused evidence is provided to a VLM, which generates structured reports describing the location of the anomaly, thermal behavior, and plausible physical interpretations while explicitly acknowledging the uncertainty and diagnostic limitations. Evaluation on two marquetries demonstrates consistent anomaly detection and stable structured interpretations, indicating reproducibility and generalizability across samples.

Interactive Web Visualization of GPT-2

Reddit r/artificial

[R] Causal self-attention as a probabilistic model over embeddings

Reddit r/MachineLearning

The 5 software development trends that actually matter in 2026 (and what they mean for your startup)

Dev.to

InVideo AI Review: Fast Finished

Dev.to

33 LangChain Alternatives That Won't Leak Your Data (2026 Guide)

Dev.to

Vision-Language Based Expert Reporting for Painting Authentication and Defect Detection

Key Points

Abstract

Related Articles

Interactive Web Visualization of GPT-2

[R] Causal self-attention as a probabilistic model over embeddings

The 5 software development trends that actually matter in 2026 (and what they mean for your startup)

InVideo AI Review: Fast Finished

33 LangChain Alternatives That Won't Leak Your Data (2026 Guide)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer