SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

arXiv cs.CL / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

SciCoQAは、科学論文と対応するコードベースの間の不整合（論文内容と実装のズレ）を検出して、忠実な実装を支援するためのデータセットである。
SciCoQAはGitHubのissuesや再現性論文から実データを構築し、さらに論文—コード不整合を拡張するための合成データ生成手法も提案している。
データセット全体は635件の不整合（実92件・合成543件）で、AI分野に加えて物理・定量生物学など計算科学へ合成により拡張されている。
22のLLMを評価した結果、SciCoQAは特に「論文の重要な詳細が省略されているケース」「長いコンテキスト」「モデルの事前学習コーパス外のデータ」において難易度が高いことが示された。
最良のモデルでも、実世界の不整合に対して検出率46.7%にとどまり、紙—コード整合性検証が未だ難題であることが示唆される。

Abstract

We present SciCoQA, a dataset for detecting discrepancies between scientific publications and their codebases to ensure faithful implementations. We construct SciCoQA from GitHub issues and reproducibility papers, and to scale our dataset, we propose a synthetic data generation method for constructing paper-code discrepancies. We analyze the paper-code discrepancies in detail and propose discrepancy types and categories to better understand the occurring mismatches. In total, our dataset consists of 635 paper-code discrepancies (92 real, 543 synthetic), covering the AI domain from real-world data and extending to Physics, Quantitative Biology, and other computational sciences through synthetic data. Our evaluation of 22 LLMs demonstrates the difficulty of SciCoQA, particularly for instances involving omitted paper details, long-context inputs, and data outside the models' pre-training corpus. The best-performing models in our evaluation, Gemini 3.1 Pro and GPT-5 Mini, detect only 46.7% of real-world paper-code discrepancies.

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Dev.to

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Dev.to

Data Sovereignty Rules and Enterprise AI

Dev.to

SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

Key Points

Abstract

Related Articles

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Data Sovereignty Rules and Enterprise AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer