The Power of Power Law: Asymmetry Enables Compositional Reasoning

arXiv cs.AI / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that natural-language knowledge and skills follow a power-law distribution, and—contrary to the common intuition—training on power-law sampled data can outperform training on uniform data for compositional reasoning tasks.
The reported gains span multiple compositional reasoning settings, including state tracking and multi-step arithmetic, where the model must combine skills across steps.
The authors introduce a simplified skill-composition benchmark and show theoretically that power-law training requires substantially less data to achieve effective learning than uniform training.
The analysis attributes the advantage to “beneficial asymmetry” from power-law sampling, which improves the loss landscape and helps models first learn frequent skill compositions before efficiently tackling rare long-tail skills.
Overall, the work reframes how to choose training data distributions for compositional reasoning, suggesting that non-uniform (power-law) sampling may be inherently more effective than enforcing uniformity.

Abstract

Natural language data follows a power-law distribution, with most knowledge and skills appearing at very low frequency. While a common intuition suggests that reweighting or curating data towards a uniform distribution may help models better learn these long-tail skills, we find a counterintuitive result: across a wide range of compositional reasoning tasks, such as state tracking and multi-step arithmetic, training under power-law distributions consistently outperforms training under uniform distributions. To understand this advantage, we introduce a minimalist skill-composition task and show that learning under a power-law distribution provably requires significantly less training data. Our theoretical analysis reveals that power law sampling induces a beneficial asymmetry that improves the pathological loss landscape, which enables models to first acquire high-frequency skill compositions with low data complexity, which in turn serves as a stepping stone to efficiently learn rare long-tailed skills. Our results offer an alternative perspective on what constitutes an effective data distribution for training models.

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

Dev.to

The Power of Power Law: Asymmetry Enables Compositional Reasoning

Key Points

Abstract

Related Articles

LLMs will be a commodity

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Dex lands $5.3M to grow its AI-driven talent matching platform

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer