WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain

arXiv cs.AI / 4/16/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • WorkRB is introduced as an open-source, community-driven benchmark aimed specifically at evaluating AI systems in the work/labor domain, where research has been fragmented and hard to compare.
  • The framework unifies 13 diverse work-related tasks across 7 task groups into standardized recommendation and NLP task formats, including job/skill and candidate recommendation as well as skill extraction and normalization.
  • WorkRB supports both monolingual and cross-lingual evaluation by dynamically loading multilingual ontologies, helping address the mismatch caused by using different labor taxonomies across studies.
  • It is designed to improve reproducibility while mitigating employment-data sensitivity, with a modular architecture that allows integration of proprietary tasks without exposing sensitive datasets.
  • WorkRB is released under the Apache 2.0 license and is made available via a public GitHub repository, enabling ongoing community contributions.

Abstract

Today's evolving labor markets rely increasingly on recommender systems for hiring, talent management, and workforce analytics, with natural language processing (NLP) capabilities at the core. Yet, research in this area remains highly fragmented. Studies employ divergent ontologies (ESCO, O*NET, national taxonomies), heterogeneous task formulations, and diverse model families, making cross-study comparison and reproducibility exceedingly difficult. General-purpose benchmarks lack coverage of work-specific tasks, and the inherent sensitivity of employment data further limits open evaluation. We present \textbf{WorkRB} (Work Research Benchmark), the first open-source, community-driven benchmark tailored to work-domain AI. WorkRB organizes 13 diverse tasks from 7 task groups as unified recommendation and NLP tasks, including job/skill recommendation, candidate recommendation, similar item recommendation, and skill extraction and normalization. WorkRB enables both monolingual and cross-lingual evaluation settings through dynamic loading of multilingual ontologies. Developed within a multi-stakeholder ecosystem of academia, industry, and public institutions, WorkRB has a modular design for seamless contributions and enables integration of proprietary tasks without disclosing sensitive data. WorkRB is available under the Apache 2.0 license at https://github.com/techwolf-ai/WorkRB.