FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

arXiv cs.LG / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes FluidWorld, a world model that predicts future states by directly integrating reaction-diffusion PDEs rather than using a separate Transformer or ConvLSTM predictor network.
  • In controlled ablation experiments on UCF-101 video prediction (64×64), FluidWorld matches single-step prediction loss but achieves substantially better reconstruction error than both a self-attention Transformer baseline and a ConvLSTM baseline.
  • FluidWorld’s learned representations show improved spatial structure preservation (10–15% higher) and higher effective dimensionality (18–25% more), suggesting better retention of spatial information.
  • Unlike the Transformer and ConvLSTM baselines, FluidWorld maintains more coherent multi-step rollouts, where the other models degrade more rapidly.
  • The approach is argued to be computationally more efficient in space (O(N) spatial complexity via PDE diffusion) and is demonstrated with training/inference conducted on a single consumer PC, without large-scale compute.

Abstract

World models learn to predict future states of an environment, enabling planning and mental simulation. Current approaches default to Transformer-based predictors operating in learned latent spaces. This comes at a cost: O(N^2) computation and no explicit spatial inductive bias. This paper asks a foundational question: is self-attention necessary for predictive world modeling, or can alternative computational substrates achieve comparable or superior results? I introduce FluidWorld, a proof-of-concept world model whose predictive dynamics are governed by partial differential equations (PDEs) of reaction-diffusion type. Instead of using a separate neural network predictor, the PDE integration itself produces the future state prediction. In a strictly parameter-matched three-way ablation on unconditional UCF-101 video prediction (64x64, ~800K parameters, identical encoder, decoder, losses, and data), FluidWorld is compared against both a Transformer baseline (self-attention) and a ConvLSTM baseline (convolutional recurrence). While all three models converge to comparable single-step prediction loss, FluidWorld achieves 2x lower reconstruction error, produces representations with 10-15% higher spatial structure preservation and 18-25% more effective dimensionality, and critically maintains coherent multi-step rollouts where both baselines degrade rapidly. All experiments were conducted on a single consumer-grade PC (Intel Core i5, NVIDIA RTX 4070 Ti), without any large-scale compute. These results establish that PDE-based dynamics, which natively provide O(N) spatial complexity, adaptive computation, and global spatial coherence through diffusion, are a viable and parameter-efficient alternative to both attention and convolutional recurrence for world modeling.