Evaluating Digital Inclusiveness of Digital Agri-Food Tools Using Large Language Models: A Comparative Analysis Between Human and AI-Based Evaluations

arXiv cs.CL / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper examines how to evaluate the digital inclusiveness of digital agri-food tools in the Global South, using the MDII framework as a baseline for expert-led assessment.
It benchmarks four LLMs (Grok, Gemini, GPT-4o, and GPT-5) to see whether AI-enabled evaluations can approximate human expert scores more quickly than the current MDII process.
Results indicate that LLMs can produce evaluative outputs that resemble expert judgment in some dimensions, but accuracy and reliability vary by model and evaluation context.
The study analyzes factors affecting performance, including temperature sensitivity and potential bias sources, highlighting the need for caution when using GenAI for inclusion monitoring.
Overall, it offers exploratory evidence for integrating GenAI into faster, resource-constrained digital development monitoring of agritools, while still treating it as a complement rather than a full replacement for experts.

Abstract

Ensuring digital inclusiveness is a critical priority in agri-food systems, particularly in the Global South, where digital divides persist. The Multidimensional Digital Inclusiveness Index (MDII) offers a comprehensive, human-led framework to assess how inclusive digital agricultural tools (agritools) are. However, the current evaluation process is resource intensive, often requiring months to complete. This study explores whether large language models (LLMs) can support a rapid, AI-enabled assessment of digital inclusiveness, complementing the MDII's existing workflow. Using a comparative analysis, the research benchmarks the performance of four LLMs (Grok, Gemini, GPT-4o, and GPT-5) against prior expert-led evaluations. The study investigates model alignment with human scores, sensitivity to temperature settings, and potential sources of bias. Findings suggest that LLMs can generate evaluative outputs that approximate expert judgment in some dimensions, though reliability varies across models and contexts. This exploratory work provides early evidence for the integration of GenAI into inclusive digital development monitoring, with implications for scaling evaluations in time-sensitive or resource-constrained environments.

Black Hat Asia

AI Business

OpenAI's pricing is about to change — here's why local AI matters more than ever

Dev.to

Google AI Tells Users to Put Glue on Their Pizza!

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Could it be that this take is not too far fetched?

Reddit r/LocalLLaMA

Evaluating Digital Inclusiveness of Digital Agri-Food Tools Using Large Language Models: A Comparative Analysis Between Human and AI-Based Evaluations

Key Points

Abstract

Related Articles

Black Hat Asia

OpenAI's pricing is about to change — here's why local AI matters more than ever

Google AI Tells Users to Put Glue on Their Pizza!

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Could it be that this take is not too far fetched?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer