Do Masked Autoencoders Improve Downhole Prediction? An Empirical Study on Real Well Drilling Data

arXiv cs.LG / 4/24/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study addresses labeling asymmetry in downhole drilling telemetry, where abundant 1Hz surface data contrast with scarce, intermittent, and costly downhole labels.
  • It presents the first empirical evaluation of masked autoencoder (MAE) pretraining for predicting a downhole metric, Total Mud Volume, using two Utah FORGE geothermal wells with ~3.5M timesteps.
  • Across a systematic search of 72 MAE configurations, the best MAE setup cuts test mean absolute error by 19.8% versus a supervised GRU baseline, though it still trails a supervised LSTM baseline by 6.4%.
  • The analysis finds latent-space width is the most influential design factor for performance (Pearson r = -0.59 with test MAE), while the masking ratio has little impact—likely due to high temporal redundancy in 1Hz drilling signals.
  • Overall, the results support MAE pretraining as a viable approach for drilling analytics and clarify when it tends to deliver the most benefit compared with fully supervised baselines.

Abstract

Downhole drilling telemetry presents a fundamental labeling asymmetry: surface sensor data are generated continuously at 1~Hz, while labeled downhole measurements are costly, intermittent, and scarce. Current machine learning approaches for downhole metric prediction universally adopt fully supervised training from scratch, which is poorly suited to this data regime. We present the first empirical evaluation of masked autoencoder (MAE) pretraining for downhole drilling metric prediction. Using two publicly available Utah FORGE geothermal wells comprising approximately 3.5 million timesteps of multivariate drilling telemetry, we conduct a systematic full-factorial design space search across 72 MAE configurations and compare them against supervised LSTM and GRU baselines on the task of predicting Total Mud Volume. Results show that the best MAE configuration reduces test mean absolute error by 19.8\% relative to the supervised GRU baseline, while trailing the supervised LSTM baseline by 6.4\%. Analysis of design dimensions reveals that latent space width is the dominant architectural choice (Pearson r = -0.59 with test MAE), while masking ratio has negligible effect, an unexpected finding attributed to high temporal redundancy in 1~Hz drilling data. These results establish MAE pretraining as a viable paradigm for drilling analytics and identify the conditions under which it is most beneficial.