Triplet Feature Fusion for Equipment Anomaly Prediction : An Open-Source Methodology Using Small Foundation Models

arXiv stat.ML / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an open-source anomaly-prediction methodology that fuses three feature types—statistical sensor features, time-series embeddings from a small LoRA-adapted Granite TinyTimeMixer, and multilingual text embeddings from Japanese equipment master records—into a single 1,116-dimensional triplet pipeline.
  • A compact LightGBM classifier (<3 MB) is trained to forecast equipment anomalies at 30/60/90-day horizons, with the full inference pipeline running on CPU in under 2 ms for air-gapped or edge deployments.
  • Results on 64 HVAC units (67,045 samples) show strong performance, including Precision 0.992, F1 0.958, and ROC-AUC 0.998 at the 30-day horizon.
  • The approach substantially lowers false positives (from 0.6% to 0.1%, an 83% reduction), with equipment-type conditioning via the text embedding playing a key role.
  • The authors report that the text and time-series embeddings cluster by fault archetypes, offering an interpretable mechanism for how compact multilingual representations improve discrimination without explicit category labels.

Abstract

Predicting equipment anomalies before they escalate into failures is a critical challenge in industrial facility management. Existing approaches rely either on hand-crafted threshold rules, which lack generalizability, or on large neural models that are impractical for on-site, air-gapped deployments. We present an industrial methodology that resolves this tension by combining open-source small foundation models into a unified 1,116-dimensional Triplet Feature Fusion pipeline. This pipeline integrates: (1) statistical features (x in R^{28}) derived from 90-day sensor histories, (2) time-series embeddings (y in R^{64}) from a LoRA-adapted IBM Granite TinyTimeMixer (TTM, 133K parameters), and (3) multilingual text embeddings (z in R^{1024}) extracted from Japanese equipment master records via multilingual-e5-large. The concatenated triplet h = [x; y; z] is processed by a LightGBM classifier (< 3 MB) trained to predict anomalies at 30-, 60-, and 90-day horizons. All components use permissive open-source licenses (Apache 2.0 / MIT). The inference-time pipeline runs entirely on CPU in under 2 ms, enabling edge deployment on co-located hardware without cloud dependency. On a dataset of 64 HVAC units comprising 67,045 samples, the triplet model achieves Precision = 0.992, F1 = 0.958, and ROC-AUC = 0.998 at the 30-day horizon. Crucially, it reduces the False Positive Rate from 0.6 percent (baseline) to 0.1 percent - an 83 percent reduction attributable to equipment-type conditioning via text embedding z. Cluster analysis reveals that the embeddings align time-series signatures with distinct fault archetypes, explaining how compact multilingual representations improve discrimination without explicit categorical encoding.