| Hey everyone in ML. I've been working on Mahoraga, an open-source orchestrator that routes tasks across local and cloud AI agents using a contextual bandit (LinUCB) that learns from every decision. Context (skip): I only started integrating AI into my workflows in late 2025, so I came on the scene broke with no credits. This left me with local models. However, many students and employees also receive credits from their institution to work with. (I got claude yippee) I wanted to be able to flawlessly route between models when credits ran out, which made me build an orchestrator. I used to use claude more as a chatbot/complete workflow engine, which made it difficult to use local models due to the context window, reasoning, etc. Opus 4.5 running open-source "superpowers" ate my usage every month. Now I realize that wasn't an effective way to use claude, or AI in general. I was using claude for both heavy planning/brainstorming and minor tasks. How about tasks specifically for code generation? Code generation is a relatively constrained task, with correct answers and short outputs. Surely local models can compete in tasks that don't need cloud? So I switched Mahoraga to an adaptable router. I ran 192 tasks across 8 agents (4 local Ollama models, 4 cloud CLIs) on a 16GB MacBook Pro, forcing round-robin so every agent got every prompt. Quality is scored by a 4-layer heuristic system (novelty ratio, structural checks, embedding similarity, length ratio). Zero API cost for evaluation, and no LLM-as-judge. Qwen3 4B in nothink mode dominates code and refactor at 33.8 t/s and 6.1s average latency. Cloud agents cluster around 0.650 on code. The local model isn't just cheaper; it's actmeasurably better for this task class. Other findings:
The bandit (LinUCB) is the only routing strategy with sublinear regret (β=0.659) across a 200-task simulation—it actually converges The routing works in two stages: the keyword classifier puts the task in a capability bucket (code, plan, research, etc.), and then the bandit picks the best agent within that bucket. 9-dimensional context vector, persistent state across sessions, warm-start from the compatibility matrix. All local inference, all free. Cloud escalation exists but only fires on retry. Why pay for cloud when a local model handles it better? Looking for any feedback, any input. Feel free to be critical: I appreciate everyone who interacts on this subreddit. I will continue to work on this in the future. A star would be appreciated: github.com/pockanoodles/Mahoraga [link] [comments] |
Qwen3 4B outperforms cloud agents on code tasks—with Mahoraga research [R]
Reddit r/MachineLearning / 4/28/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The article discusses Mahoraga, an open-source orchestrator that routes tasks between local and cloud AI agents using a contextual bandit (LinUCB) that learns from prior routing decisions.
- In a benchmark running 192 code-related tasks across eight agents on a 16GB MacBook Pro, Qwen3 4B (local) achieved the strongest performance, significantly outperforming cloud agents on code and refactoring.
- The evaluation used a four-layer heuristic quality scoring approach (novelty ratio, structural checks, embedding similarity, and length ratio) with zero API cost and no LLM-as-judge.
- Additional findings indicate that some models may be fast but lose quality (e.g., LFM2), while others (e.g., DeepSeek-R1) can be too slow due to reasoning overhead for default use.
- The author notes limitations in the scoring setup, as security-related metrics were not well captured and showed little differentiation across agents.
Related Articles

Black Hat USA
AI Business

I built Dispatch AI. I just wanted to share it. If you find it cool, take a look and leave a comment.
Dev.to

Replit AI Agent: Practical Guide for Dev Workflows
Dev.to

Open source Xiaomi MiMo-V2.5 and V2.5-Pro are among the most efficient (and affordable) at agentic 'claw' tasks
VentureBeat

Building My Own AI Coding Agent From Scratch: A Learning Journey
Dev.to