Bridging Foundation Models and ASTM Metallurgical Standards for Automated Grain Size Estimation from Microscopy Images

arXiv cs.CV / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper presents an automated pipeline that combines a domain-adapted Cellpose-SAM dense instance segmentation approach with topology-aware gradient tracking and an ASTM E112 Jeffries planimetric module for standardized grain size estimation from microscopy images.
  • The method is benchmarked against U-Net, a prompt-adapted segmentation foundation model (MatSAM), and a vision-language model (Qwen2.5-VL-7B), with results showing the adapted pipeline better preserves topological separation for microscopic counting and measurement.
  • Out-of-the-box vision-language reasoning was found insufficient for localized spatial and dense counting tasks, and MatSAM exhibited over-segmentation issues despite domain-specific prompt generation.
  • The approach demonstrates strong few-shot scalability: with only two training samples, it can predict the ASTM grain size number (G) with MAPE down to about 1.50%.
  • Robustness experiments confirm the ASTM practice of 50-grain sampling as a minimum by validating performance across varying target grain counts.

Abstract

Extracting standardized metallurgical metrics from microscopy images remains challenging due to complex grain morphology and the data demands of supervised segmentation. To bridge foundational computer vision with practical metallurgical evaluation, we propose an automated pipeline for dense instance segmentation and grain size estimation that adapts Cellpose-SAM to microstructures and integrates its topology-aware gradient tracking with an ASTM E112 Jeffries planimetric module. We systematically benchmark this pipeline against a classical convolutional network (U-Net), an adaptive-prompting vision foundation model (MatSAM) and a contemporary vision-language model (Qwen2.5-VL-7B). Our evaluations reveal that while the out-of-the-box vision-language model struggles with the localized spatial reasoning required for dense microscopic counting and MatSAM suffers from over-segmentation despite its domain-specific prompt generation, our adapted pipeline successfully maintains topological separation. Furthermore, experiments across progressively reduced training splits demonstrate exceptional few-shot scalability; utilizing only two training samples, the proposed system predicts the ASTM grain size number (G) with a mean absolute percentage error (MAPE) as low as 1.50%, while robustness testing across varying target grain counts empirically validates the ASTM 50-grain sampling minimum. These results highlight the efficacy of application-level foundation model integration for highly accurate, automated materials characterization. Our project repository is available at https://github.com/mueez-overflow/ASTM-Grain-Size-Estimator.