Diagnosing Capability Gaps in Fine-Tuning Data
arXiv cs.LG / 5/1/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper introduces GoalCover, a framework to diagnose capability gaps in fine-tuning datasets before running expensive LLM training by decomposing goals into atomic subgoals and assessing coverage.
- GoalCover assigns LLM-based alignment scores to training samples for each subgoal and uses low-scoring sample explanations to surface which capabilities are missing.
- Controlled corruption experiments across medical QA, legal summarization, and code generation show GoalCover can reliably distinguish targeted capability degradation from non-targeted impacts (25.6% vs 2.1% average degradation, Cohen’s d=1.24).
- In a financial-summarization reinforcement fine-tuning task using Qwen-3-14B, filtering data with GoalCover improves LLM-judge reward from 3.77 to 4.12, and the best performance comes from combining filtered data with goal-conditioned synthetic samples (4.20).
Related Articles

Black Hat USA
AI Business

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Why Enterprise AI Pilots Fail
Dev.to

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to