Data-Driven Open-Loop Simulation for Digital-Twin Operator Decision Support in Wastewater Treatment

arXiv cs.LG / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureIndustry & Market MovesModels & Research

Key Points

  • The paper presents CCSS-RS, a controlled continuous-time state-space (neural) simulator designed to support digital-twin operator decisions in wastewater treatment under prescribed control plans.
  • CCSS-RS separates historical state inference from future control/exogenous rollout and uses typed context encoding, gain-weighted forcing of drivers, semigroup-consistent rollouts, and Student‑t plus hurdle outputs to handle heavy-tailed and zero-inflated sensor data.
  • On the public Avedøøre full-scale benchmark (over 906,815 timesteps with 43% missingness and irregular 1–20 minute sampling), CCSS-RS achieves RMSE 0.696 and CRPS 0.349 at H=1000, improving substantially over Neural CDE baselines and simplified internal variants.
  • Multiple case studies using a frozen model checkpoint show operational relevance, including accurate impact prediction from oxygen setpoint perturbations, effective multi-criterion screening from smoothed setpoint plans, limited degradation under sensor outages, and rollout accuracy outperforming persistence for key variables.

Abstract

Wastewater treatment plants (WWTPs) need digital-twin-style decision support tools that can simulate plant response under prescribed control plans, tolerate irregular and missing sensing, and remain informative over 12-36 h planning horizons. Meeting these requirements with full-scale plant data remains an open engineering-AI challenge. We present CCSS-RS, a controlled continuous-time state-space model that separates historical state inference from future control and exogenous rollout. The model combines typed context encoding, gain-weighted forcing of prescribed and forecast drivers, semigroup-consistent rollouts, and Student-t plus hurdle outputs for heavy-tailed and zero-inflated WWTP sensor data. On the public Aved{\o}re full-scale benchmark, with 906,815 timesteps, 43% missingness, and 1-20 min irregular sampling, CCSS-RS achieves RMSE 0.696 and CRPS 0.349 at H=1000 across 10,000 test windows. This reduces RMSE by 40-46% relative to Neural CDE baselines and by 31-35% relative to simplified internal variants. Four case studies using a frozen checkpoint on test data demonstrate operational value: oxygen-setpoint perturbations shift predicted ammonium by -2.3 to +1.4 over horizons 300-1000; a smoothed setpoint plan ranks first in multi-criterion screening; context-only sensor outages raise monitored-variable RMSE by at most 10%; and ammonium, nitrate, and oxygen remain more accurate than persistence throughout the rollout. These results establish CCSS-RS as a practical learned simulator for offline scenario screening in industrial wastewater treatment, complementary to mechanistic models.