LLM-Driven Reasoning for Constraint-Aware Feature Selection in Industrial Systems

arXiv cs.CL / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes MoFA (Model Feature Agent), an LLM-driven, model-based framework for sequential, reasoning-based feature selection in industrial ML systems with limited labels and multiple operational constraints.
  • MoFA leverages structured prompts that combine semantic feature definitions, quantitative importance and correlation signals, and feature metadata (like groups/types) to produce interpretable, constraint-aware feature subsets.
  • Experiments on three real-world industrial applications show improvements in prediction accuracy and/or engagement metrics while reducing feature-group complexity and keeping models efficient.
  • In particular, MoFA finds high-order interaction terms for value/engagement modeling and selects compact, high-value feature sets for notification behavior prediction to boost both accuracy and inference efficiency.

Abstract

Feature selection is a crucial step in large-scale industrial machine learning systems, directly affecting model accuracy, efficiency, and maintainability. Traditional feature selection methods rely on labeled data and statistical heuristics, making them difficult to apply in production environments where labeled data are limited and multiple operational constraints must be satisfied. To address this, we propose Model Feature Agent (MoFA), a model-driven framework that performs sequential, reasoning-based feature selection using both semantic and quantitative feature information. MoFA incorporates feature definitions, importance scores, correlations, and metadata (e.g., feature groups or types) into structured prompts and selects features through interpretable, constraint-aware reasoning. We evaluate MoFA in three real-world industrial applications: (1) True Interest and Time-Worthiness Prediction, where it improves accuracy while reducing feature group complexity, (2) Value Model Enhancement, where it discovers high-order interaction terms that yield substantial engagement gains in online experiments, and (3) Notification Behavior Prediction, where it selects compact, high-value feature subsets that improve both model accuracy and inference efficiency. Together, these results demonstrate the practicality and effectiveness of LLM-based reasoning for feature selection in real production systems.