AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling

arXiv cs.LG / 4/9/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes AE-ViT, a deep learning reduced-order modeling framework for parametric PDEs that combines a convolutional encoder, a transformer in latent space, and a reconstruction decoder.
  • It targets limitations of existing ROMs by improving long-horizon prediction efficiency while maintaining fidelity closer to full-field models, especially for systems requiring long-range spatial interactions.
  • The method introduces joint training with multi-stage parameter injection and coordinate-channel injection so the latent dynamics are better conditioned on PDE parameters and retain spatial structure via physical coordinates.
  • Experiments on Advection-Diffusion-Reaction and Navier-Stokes cylinder-wake problems show substantially lower relative rollout error (about 5× improvement) versus comparable DL-ROM, latent-transformer, and ViT baselines, including for multi-field prediction with different magnitude/sensitivity.
  • Overall, AE-ViT aims to better model the fact that, in parametric PDEs, the initial condition alone is insufficient to determine trajectories, by dynamically adapting computations to the governing parameters.

Abstract

Deep Learning Reduced Order Models (ROMs) are becoming increasingly popular as surrogate models for parametric partial differential equations (PDEs) due to their ability to handle high-dimensional data, approximate highly nonlinear mappings, and utilize GPUs. Existing approaches typically learn evolution either on the full solution field, which requires capturing long-range spatial interactions at high computational cost, or on compressed latent representations obtained from autoencoders, which reduces the cost but often yields latent vectors that are difficult to evolve, since they primarily encode spatial information. Moreover, in parametric PDEs, the initial condition alone is not sufficient to determine the trajectory, and most current approaches are not evaluated on jointly predicting multiple solution components with differing magnitudes and parameter sensitivities. To address these challenges, we propose a joint model consisting of a convolutional encoder, a transformer operating on latent representations, and a decoder for reconstruction. The main novelties are joint training with multi-stage parameter injection and coordinate channel injection. Parameters are injected at multiple stages to improve conditioning. Physical coordinates are encoded to provide spatial information. This allows the model to dynamically adapt its computations to the specific PDE parameters governing each system, rather than learning a single fixed response. Experiments on the Advection-Diffusion-Reaction equation and Navier-Stokes flow around the cylinder wake demonstrate that our approach combines the efficiency of latent evolution with the fidelity of full-field models, outperforming DL-ROMs, latent transformers, and plain ViTs in multi-field prediction, reducing the relative rollout error by approximately 5 times.