ADeLe: Predicting and explaining AI performance across tasks

Microsoft Research Blog / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article argues that current AI/LLM benchmarks typically measure task performance without explaining the underlying capabilities that produce results.
It presents ADeLe, a Microsoft research project (with Princeton University and Universitat Politècnica de València) aimed at predicting and explaining AI performance across a wider set of tasks.
The work targets a key benchmark limitation: the inability to reliably anticipate outcomes on new tasks and to provide dependable explanations for failures.
By focusing on cross-task capability signals, the approach seeks to make model evaluation more interpretable and generalizable rather than task-specific.

AI benchmarks report how large language models (LLMs) perform on specific tasks but provide little insight into their underlying capabilities that drive their performance. They do not explain failures or reliably predict outcomes on new tasks. To address this, Microsoft researchers in collaboration with Princeton University and Universitat Politècnica de València introduce ADeLe (opens in new tab) (AI […]

The post ADeLe: Predicting and explaining AI performance across tasks appeared first on Microsoft Research.