From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation

arXiv cs.CV / 4/17/2026

📰 NewsModels & Research

Key Points

  • The paper addresses joint grain-edge segmentation (GES) and lithology semantic segmentation (LSS), which are typically handled separately despite enabling more complete rock fabric and composition quantification.
  • Although Segment Anything Model (SAM) shows strong boundary alignment, the authors argue that directly adapting it to petrographic thin-section images is difficult due to severe domain gaps from extinction-dependent color variation and ultra-fine grain boundaries.
  • The proposed Petro-SAM is a two-stage, prompt-guided multi-task framework built on SAM, designed to improve joint performance on multi-angle petrographic image stacks.
  • Petro-SAM uses a Merge Block to integrate seven polarized views, aiming to mitigate extinction-related issues, and adds multi-scale feature fusion plus color-entropy priors to refine segmentation results.

Abstract

Grain-edge segmentation (GES) and lithology semantic segmentation (LSS) are two pivotal tasks for quantifying rock fabric and composition. However, these two tasks are often treated separately, and the segmentation quality is implausible albeit expensive, time-consuming, and expert-annotated datasets have been used. Recently, foundation models, especially the Segment Anything Model (SAM), have demonstrated impressive robustness for boundary alignment. However, directly adapting SAM to joint GES and LSS is nontrivial due to 1) severe domain gap induced by extinction-dependent color variations and ultra-fine grain boundaries, and 2) lacking novel modules for joint learning on multi-angle petrographic image stacks. In this paper, we propose Petro-SAM, a novel two-stage, multi-task framework that can achieve high-quality joint GES and LSS on petrographic images. Specifically, based on SAM, we introduce a Merge Block to integrate seven polarized views, effectively solving the extinction issue. Moreover, we introduce multi-scale feature fusion and color-entropy priors to refine the detection.