Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

arXiv cs.AI / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses how LLMs can become unstable on MCQ tasks when plausible distractors cause oscillation between correct and incorrect preferences.
It introduces Inclusion-of-Thoughts (IoT), a progressive self-filtering approach that reconstructs the question using only plausible options to reduce cognitive load.
IoT is framed as a controlled way to study the stability of model comparative judgments under distractor perturbations.
By explicitly documenting the filtering process, the method aims to improve transparency and interpretability of decision-making.
Experiments on arithmetic, commonsense, and educational benchmarks show substantial gains in chain-of-thought performance with minimal added computational overhead.

Abstract

Multiple-choice questions (MCQs) are widely used to evaluate large language models (LLMs). However, LLMs remain vulnerable to the presence of plausible distractors. This often diverts attention toward irrelevant choices, resulting in unstable oscillation between correct and incorrect answers. In this paper, we propose Inclusion-of-Thoughts (IoT), a progressive self-filtering strategy that is designed to mitigate this cognitive load (i.e., instability of model preferences under the presence of distractors) and enable the model to focus more effectively on plausible answers. Our method operates to reconstruct the MCQ using only plausible option choices, providing a controlled setting for examining comparative judgements and therefore the stability of the model's internal reasoning under perturbation. By explicitly documenting this filtering process, IoT also enhances the transparency and interpretability of the model's decision-making. Extensive empirical evaluation demonstrates that IoT substantially boosts chain-of-thought performance across a range of arithmetic, commonsense reasoning, and educational benchmarks with minimal computational overhead.

30 Days, $0, Full Autonomy: The Real Report on Running an AI Agent Without a Credit Card

Dev.to

We are building an OS for AI-built software. Here's what that means

Dev.to

Claude Code Forgot My Code. Here's Why.

Dev.to

Whats'App Ai Assistant

Dev.to

I Built a $70K Security Bounty Pipeline with AI — Here's the Exact Workflow

Dev.to

Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

Key Points

Abstract

Related Articles

30 Days, $0, Full Autonomy: The Real Report on Running an AI Agent Without a Credit Card

We are building an OS for AI-built software. Here's what that means

Claude Code Forgot My Code. Here's Why.

Whats'App Ai Assistant

I Built a $70K Security Bounty Pipeline with AI — Here's the Exact Workflow

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer