Informed Machine Learning with Knowledge Landmarks

arXiv cs.LG / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents “Informed Machine Learning” as a unified framework for combining knowledge with data to build more generalizable ML models.
  • It introduces KD-ML (Knowledge-Data Machine Learning), which integrates numeric datasets with higher-level “knowledge landmarks” represented as input-output information granules.
  • The authors develop a detailed KD-ML design process and propose an augmented loss function that balances data fitting with a granular regularizer enforcing constraints derived from the knowledge landmarks.
  • They analyze how the loss hyperparameter and factors like data noise level and the granularity of knowledge landmarks affect model performance and guidance.
  • Experiments on two physics-governed benchmarks show KD-ML consistently outperforms purely data-driven ML baselines, suggesting benefits for knowledge-augmented learning in physics-related settings.

Abstract

Informed Machine Learning has emerged as a viable generalization of Machine Learning (ML) by building a unified conceptual and algorithmic setting for constructing models on a unified basis of knowledge and data. Physics-informed ML involving physics equations is one of the developments within Informed Machine Learning. This study proposes a novel direction of Knowledge-Data ML, referred to as KD-ML, where numeric data are integrated with knowledge tidbits expressed in the form of granular knowledge landmarks. We advocate that data and knowledge are complementary in several fundamental ways: data are precise (numeric) and local, usually confined to some region of the input space, while knowledge is global and formulated at a higher level of abstraction. The knowledge can be represented as information granules and organized as a collection of input-output information granules called knowledge landmarks. In virtue of this evident complementarity, we develop a comprehensive design process of the KD-ML model and formulate an original augmented loss function L, which additively embraces the component responsible for optimizing the model based on available numeric data, while the second component, playing the role of a granular regularizer, so that it adheres to the granular constraints (knowledge landmarks). We show the role of the hyperparameter positioned in the loss function, which balances the contribution and guiding role of data and knowledge, and point to some essential tendencies associated with the quality of data (noise level) and the level of granularity of the knowledge landmarks. Experiments on two physics-governed benchmarks demonstrate that the proposed KD model consistently outperforms data-driven ML models.