StackFeat RL: Reinforcement Learning over Iterative Dual Criterion Feature Selection for Stable Biomarker Discovery

arXiv cs.LG / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces StackFeat-RL, a meta-learning framework that uses REINFORCE policy gradients to tune hyperparameters for iterative dual-criterion feature selection in high-dimensional genomic data.
  • The dual criterion combines coefficient consistency with selection frequency to improve stability and address failure modes that single-criterion methods can miss.
  • Iterative accumulation is used to provide convergence guarantees via the law of large numbers, aiming to make the feature selection process more reliable under data variability.
  • Experiments on COVID-19 miRNA data and multiple Alzheimer’s disease classification tasks show StackFeat-RL achieves the best predictive accuracy versus baselines like ElasticNet, Boruta, mRMR, and stability selection.
  • The method also attains competitive performance while selecting 3–4× fewer features, supporting more compact and potentially more interpretable biomarker discovery.

Abstract

Feature selection in high-dimensional genomic data (d \gg n) demands methods that are simultaneously accurate, sparse, and stable. Existing approaches either require manual threshold specification (mRMR, stability selection), produce unstable selections under data perturbation (Lasso, Boruta), or ignore biological structure entirely. We introduce StackFeat-RL, a meta-learning framework that optimises the hyperparameters of an iterative dual-criterion feature selection algorithm via REINFORCE policy gradients. The dual criterion, requiring both coefficient consistency and selection frequency, guards against two failure modes missed by single-criterion methods, while iterative accumulation provides convergence guarantees via the law of large numbers. On COVID-19 miRNA data (GSE240888, 332 features) and three Alzheimer's disease classification tasks (GSE84422, 13237 genes; Normal vs.\ Possible, Probable, and Definite AD), StackFeat-RL achieves the highest predictive accuracy among all evaluated methods, including ElasticNet, Boruta, mRMR, and stability selection, while requiring 3--4\times fewer features. Keywords: feature selection, reinforcement learning, REINFORCE, elastic net, biomarker discovery, Alzheimer's disease, dual-criterion selection, protein interaction networks