AgriPestDatabase-v1.0: A Structured Insect Dataset for Training Agricultural Large Language Model

arXiv cs.AI / 3/25/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

アルタイ（arXiv）で、農業の害虫管理に向けた構造化昆虫データセット「AgriPestDatabase-v1.0」を提示し、9種の害虫情報をデータベースや論文から収集して専門家が検証したと述べています。
この構造化レポートからQ/Aペアを作成し、軽量LLM（7B以下）をLoRAでファインチューニングして、農業害虫管理のドメイン別Q/Aタスクで評価しています。
初期評価ではMistral 7Bが88.9%の合格率を達成し、Qwen 2.5 7B（63.9%）、LLaMA 3.1 8B（58.7%）を大きく上回ったと報告されています。
BLEUが低くても埋め込み類似度（0.865）が高いことで示されるように、表面的な語彙一致よりも意味理解・推論の質が専門タスクの成否に効く可能性を示しています。
専門家データ、品質管理を組み合わせ、エッジ端末向けにコンパクトで高性能な言語モデルを現場支援に活用できる可能性を示す内容です。

Abstract

Agricultural pest management increasingly relies on timely and accurate access to expert knowledge, yet high quality labeled data and continuous expert support remain limited, particularly for farmers operating in rural regions with unstable/no internet connectivity. At the same time, the rapid growth of AI and LLMs has created new opportunities to deliver practical decision support tools directly to end users in agriculture through compact and deployable systems. This work addresses (i) generating a structured insect information dataset, and (ii) adapting a lightweight LLM model (

\leq

7B) by fine tuning it for edge device uses in agricultural pest management. The textual data collection was done by reviewing and collecting information from available pest databases and published manuscripts on nine selected pest species. These structured reports were then reviewed and validated by a domain expert. From these reports, we constructed Q/A pairs to support model training and evaluation. A LoRA-based fine-tuning approach was applied to multiple lightweight LLMs and evaluated. Initial evaluation shows that Mistral 7B achieves an 88.9\% pass rate on the domain-specific Q/A task, substantially outperforming Qwen 2.5 7B (63.9\%), and LLaMA 3.1 8B (58.7\%). Notably, Mistral demonstrates higher semantic alignment (embedding similarity: 0.865) despite lower lexical overlap (BLEU: 0.097), indicating that semantic understanding and robust reasoning are more predictive of task success than surface-level conformity in specialized domains. By combining expert organized data, well-structured Q/A pairs, semantic quality control, and efficient model adaptation, this work contributes towards providing support for farmer facing agricultural decision support tools and demonstrates the feasibility of deploying compact, high-performing language models for practical field-level pest management guidance.

Sentiment Analysis API Tutorial: Build a Customer Review Dashboard

Dev.to

Teaching AI Agents to Handle NFTs: ERC-721, ERC-1155, and Metaplex

Dev.to

The Complete Guide to Model Context Protocol (MCP): Building AI-Native Applications in 2026

Dev.to

AI Agent Skill Security Report — 2026-03-25

Dev.to

How to Build Multi-Agent AI Systems That Actually Work: A 2026 Practical Guide

Dev.to

AgriPestDatabase-v1.0: A Structured Insect Dataset for Training Agricultural Large Language Model

Key Points

Abstract

Related Articles

Sentiment Analysis API Tutorial: Build a Customer Review Dashboard

Teaching AI Agents to Handle NFTs: ERC-721, ERC-1155, and Metaplex

The Complete Guide to Model Context Protocol (MCP): Building AI-Native Applications in 2026

AI Agent Skill Security Report — 2026-03-25

How to Build Multi-Agent AI Systems That Actually Work: A 2026 Practical Guide

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer