HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

arXiv cs.AI / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces “HeavySkill,” arguing that the key to agentic orchestration performance is not only the outer system design but also an inner, learnable “heavy thinking” skill embedded in the model parameters.
HeavySkill is formulated as a two-stage pipeline—parallel reasoning followed by summarization—that can operate beneath different agentic harness/orchestration frameworks.
The authors conduct a systematic set of experiments across multiple domains and find that HeavySkill consistently beats traditional Best-of-N (BoN) strategies, with stronger LLMs nearing Pass@N performance.
They show that the depth and breadth of the heavy-thinking ability can be scaled using reinforcement learning, pointing toward more self-evolving LLMs that reduce reliance on fragile orchestration layers.

Abstract

Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarkable success in complex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we propose HeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model's parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e., parallel reasoning then summarization, which can operate beneath any agentic harness. We present a systematic empirical study of HeavySkill across diverse domains. Our results show that this inner skill consistently outperforms traditional Best-of-N (BoN) strategies; notably, stronger LLMs can even approach Pass@N performance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled via reinforcement learning, offering a promising path toward self-evolving LLMs that internalize complex reasoning without relying on brittle orchestration layers.

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

Dev.to

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS

Dev.to

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool

Dev.to

AI is getting better at doing things, but still bad at deciding what to do?

Reddit r/artificial

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny

Dev.to

HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

Key Points

Abstract

Related Articles

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool

AI is getting better at doing things, but still bad at deciding what to do?

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer