Agentic AI for Human Resources: LLM-Driven Candidate Assessment

arXiv cs.AI / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a modular, interpretable LLM-based framework for automating recruitment candidate assessment using inputs like job descriptions, CVs, interview transcripts, and HR feedback.
  • It generates role-specific, LLM-produced evaluation rubrics and uses a multi-agent architecture to produce structured reports that aim to closely match expert judgment rather than keyword-based ATS scoring.
  • For ranking, it introduces an LLM-driven active listwise tournament approach that performs mini-tournaments over small candidate subsets and aggregates results with a Plackett-Luce model for coherent, global rankings.
  • The method is designed to be transparent and auditable, producing ranked recommendations and candidate comparisons that can fit real-world hiring workflows.
  • An active-learning loop selects the most informative candidate subsets to improve sample efficiency and reduce noisy or inconsistent ranking from independent pairwise comparisons.

Abstract

In this work, we present a modular and interpretable framework that uses Large Language Models (LLMs) to automate candidate assessment in recruitment. The system integrates diverse sources, including job descriptions, CVs, interview transcripts, and HR feedback; to generate structured evaluation reports that mirror expert judgment. Unlike traditional ATS tools that rely on keyword matching or shallow scoring, our approach employs role-specific, LLM-generated rubrics and a multi-agent architecture to perform fine-grained, criteria-driven evaluations. The framework outputs detailed assessment reports, candidate comparisons, and ranked recommendations that are transparent, auditable, and suitable for real-world hiring workflows. Beyond rubric-based analysis, we introduce an LLM-Driven Active Listwise Tournament mechanism for candidate ranking. Instead of noisy pairwise comparisons or inconsistent independent scoring, the LLM ranks small candidate subsets (mini-tournaments), and these listwise permutations are aggregated using a Plackett-Luce model. An active-learning loop selects the most informative subsets, producing globally coherent and sample-efficient rankings. This adaptation of listwise LLM preference modeling (previously explored in financial asset ranking) provides a principled and highly interpretable methodology for large-scale candidate ranking in talent acquisition.