Semantic Embeddings of Chemical Elements for Enhanced Materials Inference and Discovery

arXiv cs.CL / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a framework to generate universal semantic embeddings for chemical elements to improve materials inference and accelerate discovery.
  • It uses ElementBERT, a BERT-based model trained on 1.29 million abstracts related to alloy science, to learn latent, alloy-specific contextual relationships.
  • The resulting semantic embeddings act as robust elemental descriptors and reportedly outperform traditional empirical descriptors across several downstream tasks.
  • The framework improves performance in predicting mechanical/transformation properties, classifying phase structures, and optimizing materials properties using Bayesian optimization.
  • Experiments on titanium, high-entropy, and shape memory alloys show up to 23% gains in prediction accuracy, with ElementBERT also beating general-purpose BERT variants.

Abstract

We present a framework for generating universal semantic embeddings of chemical elements to advance materials inference and discovery. This framework leverages ElementBERT, a domain-specific BERT-based natural language processing model trained on 1.29 million abstracts of alloy-related scientific papers, to capture latent knowledge and contextual relationships specific to alloys. These semantic embeddings serve as robust elemental descriptors, consistently outperforming traditional empirical descriptors with significant improvements across multiple downstream tasks. These include predicting mechanical and transformation properties, classifying phase structures, and optimizing materials properties via Bayesian optimization. Applications to titanium alloys, high-entropy alloys, and shape memory alloys demonstrate up to 23% gains in prediction accuracy. Our results show that ElementBERT surpasses general-purpose BERT variants by encoding specialized alloy knowledge. By bridging contextual insights from scientific literature with quantitative inference, our framework accelerates the discovery and optimization of advanced materials, with potential applications extending beyond alloys to other material classes.