AI Navigate

Automatic Inter-document Multi-hop Scientific QA Generation

arXiv cs.CL / 3/17/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • AIM-SciQA is a new automated framework for generating inter-document, multi-hop scientific QA datasets.
  • It uses large language models for single-hop QAs with machine reading comprehension and builds cross-document relations through embedding-based semantic alignment and selective citation information.
  • Applied to 8,211 PubMed Central papers, it yields 411,409 single-hop QAs and 13,672 multi-hop QAs, forming the IM-SciQA dataset, with a citation-guided CIM-SciQA variant achieving comparable performance to the Oracle setting.
  • Validation by human and automatic metrics confirms high factual consistency and shows the dataset effectively differentiates retrieval and QA reasoning, providing a realistic benchmark for retrieval-augmented scientific reasoning.
  • The approach is extensible beyond PubMed Central, reinforcing the dataset's validity and generality across corpora.

Abstract

Existing automatic scientific question generation studies mainly focus on single-document factoid QA, overlooking the inter-document reasoning crucial for scientific understanding. We present AIM-SciQA, an automated framework for generating multi-document, multi-hop scientific QA datasets. AIM-SciQA extracts single-hop QAs using large language models (LLMs) with machine reading comprehension and constructs cross-document relations based on embedding-based semantic alignment while selectively leveraging citation information. Applied to 8,211 PubMed Central papers, it produced 411,409 single-hop and 13,672 multi-hop QAs, forming the IM-SciQA dataset. Human and automatic validation confirmed high factual consistency, and experimental results demonstrate that IM-SciQA effectively differentiates reasoning capabilities across retrieval and QA stages, providing a realistic and interpretable benchmark for retrieval-augmented scientific reasoning. We further extend this framework to construct CIM-SciQA, a citation-guided variant achieving comparable performance to the Oracle setting, reinforcing the dataset's validity and generality.