AdaQE-CG: Adaptive Query Expansion for Web-Scale Generative AI Model and Data Card Generation

arXiv cs.AI / 4/14/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces AdaQE-CG, a framework to generate more transparent and standardized model and data cards for web-scale generative AI systems by addressing static templates, incomplete metadata, and missing evaluation standards.
AdaQE-CG uses IPE-QE to iteratively refine context-aware extraction queries from scientific papers and repositories, improving the completeness of recovered information.
It also uses ICC-MP with a MetaGAI Pool to complete missing card fields via semantic knowledge transfer from similar, curated cards.
The authors release MetaGAI-Bench, an expert-annotated large-scale benchmark to evaluate documentation quality across multiple dimensions, with reported results showing AdaQE-CG outperforms prior methods and approaches human-level model-card quality.
The code, prompts, and data are published on GitHub to support reproducibility and further research.

Abstract

Transparent and standardized documentation is essential for building trustworthy generative AI (GAI) systems. However, existing automated methods for generating model and data cards still face three major challenges: (i) static templates, as most systems rely on fixed query templates that cannot adapt to diverse paper structures or evolving documentation requirements; (ii) information scarcity, since web-scale repositories such as Hugging Face often contain incomplete or inconsistent metadata, leading to missing or noisy information; and (iii) lack of benchmarks, as the absence of standardized datasets and evaluation protocols hinders fair and reproducible assessment of documentation quality. To address these limitations, we propose AdaQE-CG, an Adaptive Query Expansion for Card Generation framework that combines dynamic information extraction with cross-card knowledge transfer. Its Intra-Paper Extraction via Context-Aware Query Expansion (IPE-QE) module iteratively refines extraction queries to recover richer and more complete information from scientific papers and repositories, while its Inter-Card Completion using the MetaGAI Pool (ICC-MP) module fills missing fields by transferring semantically relevant content from similar cards in a curated dataset. In addition, we introduce MetaGAI-Bench, the first large-scale, expert-annotated benchmark for evaluating GAI documentation. Comprehensive experiments across five quality dimensions show that AdaQE-CG substantially outperforms existing approaches, exceeds human-authored data cards, and approaches human-level quality for model cards. Code, prompts, and data are publicly available at: https://github.com/haoxuan-unt2024/AdaQE-CG.