Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction

arXiv cs.AI / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes PETITE, a tutor-student multi-agent interaction framework that uses role-differentiated exchanges to improve LLM problem solving beyond standard prompting setups.
Two agents derived from the same LLM play asymmetric roles: a student agent iteratively drafts and refines code solutions while a tutor agent provides structured feedback without access to ground-truth answers.
PETITE is evaluated on the APPS coding benchmark and compared with methods such as Self-Consistency, Self-Refine, Multi-Agent Debate, and Multi-Agent Review.
Results indicate PETITE achieves similar or higher accuracy than prior approaches while using significantly fewer tokens, emphasizing resource efficiency.
The authors argue that developmental principles (scaffolding and peer-like tutoring structures) offer a principled alternative to relying on stronger supervisory models or heterogeneous ensembles.

Abstract

Human cognitive development is shaped not only by individual effort but by structured social interaction, where role-based exchanges such as those between a tutor and a learner, enable solutions that neither could achieve alone. Inspired by these developmental principles, we ask the question whether a tutor-student multi-agent system can create a synergistic effect by pushing Large Language Model (LLM) beyond what it can do within existing frameworks. To test the idea, we adopt autonomous coding problem domain where two agents instantiated from the same LLM assigned asymmetric roles: a student agent generates and iteratively refines solutions, while a tutor agent provides structured evaluative feedback without access to ground-truth answers. In our proposed framework (PETITE), we aim to extract better problem-solving performance from one model by structuring its interaction through complementary roles, rather than relying on stronger supervisory models or heterogeneous ensembles. Our model is evaluated on the APPS coding benchmark against state-of-the-art approaches of Self-Consistency, Self-Refine, Multi-Agent Debate, and Multi-Agent Review. The results show that our model achieves similar or higher accuracy while consuming significantly fewer tokens. These results suggest that developmentally grounded role-differentiated interaction structures provide a principled and resource-efficient paradigm for enhancing LLM problem-solving through structured peer-like interactions. Index Terms- Peer Tutoring, Scaffolding, Large Language Models, Multi-Agent Systems, Code Generation