Evaluating Digital Inclusiveness of Digital Agri-Food Tools Using Large Language Models: A Comparative Analysis Between Human and AI-Based Evaluations
arXiv cs.CL / 4/7/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper examines how to evaluate the digital inclusiveness of digital agri-food tools in the Global South, using the MDII framework as a baseline for expert-led assessment.
- It benchmarks four LLMs (Grok, Gemini, GPT-4o, and GPT-5) to see whether AI-enabled evaluations can approximate human expert scores more quickly than the current MDII process.
- Results indicate that LLMs can produce evaluative outputs that resemble expert judgment in some dimensions, but accuracy and reliability vary by model and evaluation context.
- The study analyzes factors affecting performance, including temperature sensitivity and potential bias sources, highlighting the need for caution when using GenAI for inclusion monitoring.
- Overall, it offers exploratory evidence for integrating GenAI into faster, resource-constrained digital development monitoring of agritools, while still treating it as a complement rather than a full replacement for experts.
Related Articles

Black Hat Asia
AI Business

OpenAI's pricing is about to change — here's why local AI matters more than ever
Dev.to

Google AI Tells Users to Put Glue on Their Pizza!
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Could it be that this take is not too far fetched?
Reddit r/LocalLLaMA