SCOPE-FE: Structured Control of Operator and Pairwise Exploration for Feature Engineering

arXiv cs.LG / 5/1/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes SCOPE-FE, a structured search-space control framework to make automatic feature engineering for tabular data more efficient as dimensionality increases.
  • It addresses combinatorial explosion from operator-feature combinations by jointly regulating both the operator space and the feature-pair candidate space before generating features.
  • OperatorProbing estimates operator utility on the specific dataset and removes low-contribution operators in advance to shrink the search space.
  • FeatureClustering uses spectral embedding and fuzzy c-means clustering to group related features, limiting feature-pair combinations to within clusters.
  • A ReliabilityScoring mechanism uses variance across subsamples to stabilize pruning decisions, and experiments on ten benchmarks show large time reductions while keeping competitive predictive performance, especially on high-dimensional datasets.

Abstract

Automatic feature engineering is an effective approach for improving predictive performance in tabular learning. However, expand-and-reduce methods, such as OpenFE, become increasingly computationally expensive as the input dimensionality grows. This limitation arises primarily from the combinatorial explosion of candidate features generated through operator-feature combinations. To address this issue, we propose SCOPE-FE, a structured search space control framework that improves efficiency by reducing the candidate space prior to feature generation. SCOPE-FE jointly regulates two major sources of combinatorial growth: the operator space and feature-pair space. First, OperatorProbing estimates the dataset-specific utility of candidate operators and eliminates low-contribution operators in advance. Second, FeatureClustering employs spectral embedding and fuzzy c-means clustering to group structurally related features, thereby restricting candidate generation to relevant within-cluster combinations. In addition, we introduce ReliabilityScoring, which incorporates variance across subsamples to stabilize pruning decisions. Experiments on ten benchmark datasets demonstrate that SCOPE-FE substantially reduces feature engineering time while maintaining competitive predictive performance relative to existing baselines. The efficiency gains are particularly pronounced for high-dimensional datasets. These results indicate that structured control of the search space is an effective strategy for scalable automatic feature engineering. The code will be made publicly available upon acceptance.