Tokenised Flow Matching for Hierarchical Simulation Based Inference

arXiv cs.LG / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper targets a major practical bottleneck in Simulation Based Inference (SBI): expensive simulator evaluations, especially in hierarchical models.
  • It introduces likelihood factorisation (LF) training that learns per-site neural surrogates from single-site simulations and then assembles synthetic multi-site observations for amortised inference of the full hierarchical posterior.
  • Building on LF, the authors propose Tokenised Flow Matching for Posterior Estimation (TFMPE), which uses tokenised flow matching to handle function-valued observations under likelihood factorisation.
  • To measure progress systematically, they also release a benchmark for hierarchical SBI and validate TFMPE on both the benchmark and realistic infectious-disease and computational fluid dynamics models.
  • Results indicate TFMPE produces well-calibrated posteriors while lowering computational cost compared with prior hierarchical SBI approaches.

Abstract

The cost of simulator evaluations is a key practical bottleneck for Simulation Based Inference (SBI). In hierarchical settings with shared global parameters and exchangeable site-level parameters and observations, this structure can be exploited to improve simulation efficiency. Existing hierarchical SBI approaches factorise the posterior yet still simulate across multiple sites per training sample; We instead explore likelihood factorisation (LF) to train from single-site simulations. In LF sampling we learn a per-site neural surrogate of the simulator and then assemble synthetic multi-site observations to amortise inference for the full hierarchical posterior. Building on this, we propose Tokenised Flow Matching for Posterior Estimation (TFMPE), a tokenised flow matching approach that supports function-valued observations through likelihood factorisation. To enable systematic evaluation, we introduce a benchmark for hierarchical SBI. We validate TFMPE on this benchmark and on realistic infectious disease and computational fluid dynamics models, finding well-calibrated posteriors while reducing computational cost.