Assessing the Potential of Masked Autoencoder Foundation Models in Predicting Downhole Metrics from Surface Drilling Data

arXiv cs.LG / 4/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The article reviews 13 papers (2015–2025) on predicting downhole oil-and-gas metrics from surface drilling time-series, focusing on challenges caused by limited labeled downhole data.
  • It maps eight commonly collected surface sensor metrics to seven downhole target metrics used across the literature.
  • While existing work largely relies on ANN and LSTM-style models, the review finds that Masked Autoencoder Foundation Models (MAEFMs) have not been studied for this task.
  • The study argues MAEFMs could be a technically feasible improvement due to self-supervised pretraining on abundant unlabeled data, supporting multi-task learning and better generalization across wells, and recommends future benchmarking against current baselines.
  • It positions MAEFMs as an underexplored opportunity for drilling analytics, while highlighting the need for empirical validation and assessment of broader applicability in oil and gas operations.

Abstract

Oil and gas drilling operations generate extensive time-series data from surface sensors, yet accurate real-time prediction of critical downhole metrics remains challenging due to the scarcity of labelled downhole measurements. This systematic mapping study reviews thirteen papers published between 2015 and 2025 to assess the potential of Masked Autoencoder Foundation Models (MAEFMs) for predicting downhole metrics from surface drilling data. The review identifies eight commonly collected surface metrics and seven target downhole metrics. Current approaches predominantly employ neural network architectures such as artificial neural networks (ANNs) and long short-term memory (LSTM) networks, yet no studies have explored MAEFMs despite their demonstrated effectiveness in time-series modeling. MAEFMs offer distinct advantages through self-supervised pre-training on abundant unlabeled data, enabling multi-task prediction and improved generalization across wells. This research establishes that MAEFMs represent a technically feasible but unexplored opportunity for drilling analytics, recommending future empirical validation of their performance against existing models and exploration of their broader applicability in oil and gas operations.