Agent skills look great in benchmarks but fall apart under realistic conditions, researchers find

THE DECODER / 4/12/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Researchers report that AI “agent skills” (modular, on-demand instructions meant to provide specialized capabilities) show limited benefits when tested under realistic conditions rather than benchmark settings.
  • In experiments covering 34,000 real-world skills, the enhancements were found to be barely helpful overall in practical scenarios.
  • The study also finds a counterintuitive effect: weaker AI models can perform worse when agent skills are enabled than when they run without those skills.
  • The findings suggest that current skill-based augmentation may be brittle and that evaluation should emphasize real-world conditions to avoid misleading benchmark gains.

AI agents are supposed to tap into specialized knowledge through so-called skills, modular instructions they can pull up on the fly. But a study testing 34,000 real-world skills finds these enhancements barely help under realistic conditions. Weaker models actually perform worse with them than without.

The article Agent skills look great in benchmarks but fall apart under realistic conditions, researchers find appeared first on The Decoder.