Adaptation of AI-accelerated CFD Simulations to the IPU platform

arXiv cs.AI / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper evaluates how Intelligence Processing Units (IPUs) can accelerate “AI for simulation” workloads, specifically machine-learning models for computational fluid dynamics (CFD).
  • It adapts a TensorFlow-based training pipeline (using Poplar SDK) to the IPU-POD16 platform and trains on data generated from OpenFOAM simulations to predict CFD simulation states.
  • The authors use the popdist library to address a host-side training-data feeding bottleneck, achieving up to a 34% speedup.
  • While two-IPU data parallelism does not improve throughput due to communication overheads, scaling to more IPUs (from 2 to 16) substantially increases throughput from 560.8 to 2805.8 samples per second.

Abstract

Intelligence Processing Units (IPU) have proven useful for many AI applications. In this paper, we evaluate them within the emerging field of \emph{AI for simulation}, where traditional numerical simulations are supported by artificial intelligence approaches. We focus specifically on a program for training machine learning models supporting a \emph{computational fluid dynamics} application. We use custom TensorFlow provided by the Poplar SDK to adapt the program for the IPU-POD16 platform and investigate its ease of use and performance scalability. Training a model on data from OpenFOAM simulations allows us to get accurate simulation state predictions in test time. We show how to utilize the \emph{popdist} library to overcome a performance bottleneck in feeding training data to the IPU on the host side, achieving up to 34\% speedup. Due to communication overheads, using data parallelism to utilize two IPUs instead of one does not improve the throughput. However, once the intra-IPU costs have been paid, the hardware capabilities for inter-IPU communication allow for good scalability. Increasing the number of IPUs from 2 to 16 improves the throughput from 560.8 to 2805.8 samples/s.