Benchmarking Local Language Models for Social Robots using Edge Devices
arXiv cs.RO / 5/6/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper addresses a lack of systematic benchmarks for evaluating open-source local LLMs on edge devices for social-educational robots, focusing on responsiveness and privacy under tight compute constraints.
- It benchmarks 25 models on edge hardware (primarily Raspberry Pi 4, with checks on Raspberry Pi 5 and a laptop GPU) using three evaluation dimensions: inference efficiency, general knowledge (MMLU subset), and teaching effectiveness (LLM-rated quality validated by human raters).
- Results show large model-to-model trade-offs, with inference throughput and energy efficiency varying by more than an order of magnitude, MMLU accuracy ranging from near-random up to 57.2%, and teaching effectiveness that does not monotonically track either efficiency or knowledge scores.
- Granite4 Tiny Hybrid (7B) is identified as a strong overall choice, balancing efficiency and knowledge while achieving high teaching-relevant performance, and human validation largely confirms the automated ranking.
- The authors use the findings to propose a three-tier local inference architecture for the Robot Study Companion (RSC) to better balance latency, accuracy, and compute limits on resource-constrained hardware.
Related Articles

Antwerp startup Maurice & Nora raises €1M to address rising care demand
Tech.eu

SIFS (SIFS Is Fast Search) - local code search for coding agents
Dev.to

Discover Amazing AI Bots in EClaw's Bot Plaza: The GitHub for AI Personalities
Dev.to

BizNode's semantic memory (Qdrant) makes your bot smarter over time — it remembers past conversations and answers...
Dev.to
Amd radeon ai pro r9700 32GB VS 2x RTX 5060TI 16GB for local setup?
Reddit r/LocalLLaMA