"Excuse me, may I say something..." CoLabScience, A Proactive AI Assistant for Biomedical Discovery and LLM-Expert Collaborations

arXiv cs.AI / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper presents CoLabScience, a proactive LLM assistant aimed at improving biomedical discovery by enabling timely, context-aware interventions during collaborative workflows rather than only responding to prompts.
  • It introduces PULI (Positive-Unlabeled Learning-to-Intervene), a reinforcement-learning-based framework that decides when and how to intervene in streaming scientific discussions using project proposals plus short- and long-term conversational memory.
  • The authors also release BSDD, a new benchmark dataset of simulated biomedical streaming dialogues with intervention points derived from PubMed articles.
  • Experiments indicate PULI delivers higher intervention precision and better collaborative task utility than existing baselines, suggesting proactive LLMs could be effective scientific partners.
  • Overall, the work positions proactive LLM behavior as a key step toward more autonomous and useful AI support in biomedical research collaboration.

Abstract

The integration of Large Language Models (LLMs) into scientific workflows presents exciting opportunities to accelerate biomedical discovery. However, the reactive nature of LLMs, which respond only when prompted, limits their effectiveness in collaborative settings that demand foresight and autonomous engagement. In this study, we introduce CoLabScience, a proactive LLM assistant designed to enhance biomedical collaboration between AI systems and human experts through timely, context-aware interventions. At the core of our method is PULI (Positive-Unlabeled Learning-to-Intervene), a novel framework trained with a reinforcement learning objective to determine when and how to intervene in streaming scientific discussions, by leveraging the team's project proposal and long- and short-term conversational memory. To support this work, we introduce BSDD (Biomedical Streaming Dialogue Dataset), a new benchmark of simulated research discussion dialogues with intervention points derived from PubMed articles. Experimental results show that PULI significantly outperforms existing baselines in both intervention precision and collaborative task utility, highlighting the potential of proactive LLMs as intelligent scientific assistants.