AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis

arXiv cs.CL / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces AOP-Smart, an AOP-oriented Retrieval-Augmented Generation (RAG) framework designed to improve reliability in Adverse Outcome Pathway (AOP) question answering and mechanistic reasoning.
AOP-Smart uses official AOP-Wiki XML data to retrieve relevant knowledge based on Key Events (KEs), Key Event Relationships (KERs), and AOP-specific information, aiming to reduce LLM hallucinations.
The authors evaluate the approach on 20 AOP-related QA tasks spanning KE identification and both simple and complex retrieval across upstream/downstream relationships.
Experiments across Gemini, DeepSeek, and ChatGPT show large accuracy gains when using RAG versus no-RAG (e.g., GPT from 15% to 95%, DeepSeek from 35% to 100%, Gemini from 20% to 95%).

Abstract

Adverse Outcome Pathways (AOPs) are an important knowledge framework in toxicological research and risk assessment. In recent years, large language models (LLMs) have gradually been applied to AOP-related question answering and mechanistic reasoning tasks. However, due to the existence of the hallucination problem, that is, the model may generate content that is inconsistent with facts or lacks evidence, their reliability is still limited. To address this issue, this study proposes an AOP-oriented Retrieval-Augmented Generation (RAG) framework, AOP-Smart. Based on the official XML data from AOP-Wiki, this method uses Key Events (KEs), Key Event Relationships (KERs), and specific AOP information to retrieve relevant knowledge for user questions, thereby improving the reliability of the generated results of large language models. To evaluate the effectiveness of the proposed method, this study constructed a test set containing 20 AOP-related question answering tasks, covering KE identification, upstream and downstream KE retrieval, and complex AOP retrieval tasks. Experiments were conducted on three mainstream large language models, Gemini, DeepSeek, and ChatGPT, and comparative tests were performed under two settings: without RAG and with RAG. The experimental results show that, without using RAG, the accuracies of GPT, DeepSeek, and Gemini were 15.0\%, 35.0\%, and 20.0\%, respectively; after using RAG, their accuracies increased to 95.0\%, 100.0\%, and 95.0\%, respectively. The results indicate that AOP-Smart can significantly alleviate the hallucination problem of large language models in AOP knowledge tasks, and greatly improve the accuracy and consistency of their answers.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/14DailyView insight →

Black Hat Asia

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Don't forget, there is more than forgetting: new metrics for Continual Learning

Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Dev.to

Bit of a strange question?

Reddit r/artificial

AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Don't forget, there is more than forgetting: new metrics for Continual Learning

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Bit of a strange question?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer