CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space

arXiv cs.CL / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces CONDESION-BENCH to measure how well large language models perform conditional decision-making when actions have compositional structure rather than a fixed candidate list.
It models actions as allocations to decision variables and enforces explicit feasibility conditions at multiple levels (variable, contextual, and allocation) to better reflect real-world constraints.
The benchmark uses oracle-based evaluation to judge both decision quality and compliance with the specified conditions, aiming for a more rigorous assessment of LLMs in decision-support settings.
The work addresses limitations of prior decision-making benchmarks that assume finite action sets and ignore explicit constraints on action validity.

Abstract

Large language models have been widely explored as decision-support tools in high-stakes domains due to their contextual understanding and reasoning capabilities. However, existing decision-making benchmarks rely on two simplifying assumptions: actions are selected from a finite set of pre-defined candidates, and explicit conditions restricting action feasibility are not incorporated into the decision-making process. These assumptions fail to capture the compositional structure of real-world actions and the explicit conditions that constrain their validity. To address these limitations, we introduce CONDESION-BENCH, a benchmark designed to evaluate conditional decision-making in compositional action space. In CONDESION-BENCH, actions are defined as allocations to decision variables and are restricted by explicit conditions at the variable, contextual, and allocation levels. By employing oracle-based evaluation of both decision quality and condition adherence, we provide a more rigorous assessment of LLMs as decision-support tools.