Curated AI beats frontier LLMs at pharma asset discovery

arXiv cs.AI / 5/7/2026

📰 NewsIndustry & Market MovesModels & Research

Key Points

  • The study (arXiv:2605.04908v1) benchmarks Gosset, an AI platform using curated drug-asset annotations, against four frontier web-enabled LLM systems for pharmaceutical asset discovery.
  • With identical natural-language queries and a shared JSON output schema, Gosset finds 3.2× more verified drugs per query than the best frontier model across 10 niche oncology/immunology targets.
  • Gosset achieves perfect precision and 100% recall relative to the union of verified drugs across the compared systems, indicating strong coverage without false positives.
  • The same curated knowledge base is available via a Gosset MCP server as a tool that frontier models can call, implying frontier LLMs could substantially close the recall gap by using curated indexing instead of generic web search.

Abstract

General-purpose LLMs with web search are increasingly used to scout the competitive landscape of pharmaceutical pipelines. We benchmark Gosset -- an AI platform with a chat interface backed by curated target-, modality-, and indication-level drug-asset annotations -- against four frontier systems with web access (Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro, Perplexity sonar-pro) on ten niche oncology/immunology targets where most of the pipeline lives in the long tail of preclinical and Asian-developed assets. All five systems receive the same natural-language query and the same JSON output schema. Across 10 targets Gosset returns 3.2x more verified drugs per query than the best frontier system, at perfect precision and 100% recall against the cross-system union of verified drugs. The same curated index is exposed as a Gosset MCP server that any frontier model can call as a tool, suggesting that each of these systems can close most of the recall gap by swapping generic web search for a curated index behind the same chat interface.