BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature
arXiv cs.AI / 4/25/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper introduces BioMiner, a multi-modal framework that automates extraction of protein–ligand bioactivity data from scientific literature by explicitly separating “bioactivity semantics” from “ligand structure” reconstruction.
- BioMiner infers bioactivity meaning via direct reasoning, while ligand structures are resolved through chemically grounded visual semantic reasoning using multi-modal LLMs, with exact molecular construction handled by chemistry domain tools.
- It also presents BioVista, a benchmark dataset containing 16,457 curated bioactivity entries from 500 publications, enabling rigorous evaluation and development.
- BioMiner reports an F1 score of 0.32 for bioactivity triplets and demonstrates practical impact through three use cases: building a pre-training database (3.9% downstream improvement), improving human-in-the-loop NLRP3 data quality (38.6% vs. QSAR baselines, plus 16 novel-scaffold hit candidates), and accelerating protein–ligand bioactivity annotation (5.59× faster with 5.75% accuracy gains).
- Overall, the work addresses a key bottleneck in automated bioactivity extraction by combining semantic understanding across text/tables/figures with chemistry-grounded structure reconstruction.
Related Articles

Black Hat USA
AI Business
Navigating WooCommerce AI Integrations: Lessons for Agencies & Developers from a Bluehost Conflict
Dev.to

Underwhelming or underrated? DeepSeek V4 shows “impressive” gains
SCMP Tech

Pics of new rig!
Reddit r/LocalLLaMA

Claude Code: Hooks, Subagents, and Skills — Complete Guide
Dev.to