PeopleSearchBench: A Multi-Dimensional Benchmark for Evaluating AI-Powered People Search Platforms
arXiv cs.AI / 3/31/2026
📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The paper introduces PeopleSearchBench, an open-source benchmark to evaluate AI-powered people search platforms using 119 real-world queries across four use cases (corporate recruiting, B2B prospecting, deterministic expert search, and influencer/KOL discovery).
- It proposes Criteria-Grounded Verification, which extracts explicit, verifiable criteria from each query and uses live web search to produce binary relevance judgments based on factual checks rather than subjective LLM-as-judge scoring.
- The benchmark evaluates systems on three dimensions—Relevance Precision (padded nDCG@10), Effective Coverage (task completion and qualified yield), and Information Utility (profile completeness/usefulness)—and averages them into an overall score.
- In experiments, Lessie is the top-performing agent with an overall score of 65.2 (18.5% ahead of the runner-up) and the only system achieving 100% task completion across all queries.
- The authors publish full artifacts (code, query definitions, prompts, normalization procedures, and results) and include statistical reporting such as confidence intervals and human validation of the verification pipeline (Cohen’s kappa = 0.84).
Related Articles

Black Hat Asia
AI Business
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside
Dev.to

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to