Accurate and Robust Generative Approach for Overcoming Data Sparsity and Imbalance in Landslide Modeling with A Tabular Foundation Model

arXiv cs.LG / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how landslide studies suffer from sparse and imbalanced inventories, which hampers understanding of triggering conditions and failure mechanisms.
  • It proposes generating multi-feature landslide datasets using a tabular foundation model to better capture multivariate dependencies and statistical properties from limited observations.
  • The approach is designed to be more accurate and robust than prior landslide data generation methods, especially when multiple factors interact across scenarios.
  • Experiments across 20 landslide inventories show the generated data match observed distributions, preserve realistic feature relationships, and remain robust across different environmental contexts.
  • The authors argue the method can strengthen landslide susceptibility modeling and risk assessment when observational data are limited.

Abstract

Landslide investigation relies on sufficient and well-balanced observational data influenced by geological, hydrological, and anthropogenic factors. Available landslide inventories are often sparse and imbalanced, which limits understanding of triggering conditions and failure mechanisms. Data generation provides an effective approach to help capture feature dependencies from limited landslide observations. However, existing generation approaches for landslides often struggle to capture complex relationships among features and lack robustness across multiple scenarios and interacting factors. Here, we propose an accurate and robust approach for generating multi-feature landslide datasets by utilizing a tabular foundation model. By leveraging the capacity to learn from limited observations, the proposed approach effectively preserves the multivariate dependencies and statistical characteristics inherent in landslide occurrences. Comparative experiments on 20 landslide inventories demonstrate that the generated datasets closely align with observed distributions, maintain realistic feature dependencies, and exhibit robustness across different environmental contexts. This work provides an effective approach to overcome data sparsity and imbalance and strengthens landslide susceptibility modeling and risk assessment under limited observations.