Small models can be good agents

Reddit r/LocalLLaMA / 3/22/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The post demonstrates that smaller models in the sub-30B range can act as agents by breaking complex tasks into smaller steps and instructing them to generate runnable JavaScript code inside a sandbox with custom functions and MCP tools.
The experiments rely on external GPU rental (RTX 3090s) rather than owning hardware, highlighting cost and access considerations for hobbyists and developers.
The author reports mixed results across models (Nemotron-3-Nano-30B-A3B, Nemotron-3-Nano-4B, Nemotron-Cascade-2-30B-A3B, Qwen3.5-27B/9B, OmniCoder 9B), including repetition, unexpected tool calls, JSON schema difficulties, and a cache/memory error (llama.cpp 524).
The takeaway is that model fit to instruction and prompt engineering substantially affects outcomes, and achieving consumer-friendly performance may require different hardware choices or better-aligned models.

I have been messing with some of the smaller models (think sub 30B range), and getting them to do complex tasks.

My approach is pretty standard: take a big problem and get it to break it down into smaller tasks. They are instructed to create JavaScript code that runs in a sandbox (v8), with custom functions and MCP tools.

Though I don't currently have the hardware to run this myself, I am using a provider to rent GPU by the hour (usually one or two RTX 3090). Keep that in mind for some of this.

The task I gave them is this:

Check for new posts on https://www.reddit.com/r/LocalLLaMA/new/.rss This is a XML atom/feed file, convert and parse it as JSON. The posts I am intersted in is dicussions about AI and LLMs. If people are sharing their project, ignore it. All saved files need to go here: /home/zero/agent-sandbox Prepend this path when interacting with all files. You have full access to this directory, so no need to confirm it. When calling an URL to fetch their data, set max_length to 100000 and save the data to a seperate file. Use this file to do operations. Save each interesting post as a seperate file.

It had these tools; brave search, filesystem, and fetch (to get page content)

The biggest issue I run into are models that aren't well fit for instructions, and trying to keep context in check so one prompt doesn't take two minutes to complete instead of two seconds.

I could possibly bypass this with more GPU power? But I want it to be more friendly to consumers (and my future wallet if I end up investing in some).

So I'd like to share my issues with certain models, and maybe others can confirm or deny. I tried my best to use the parameters listed on their model pages, but sometimes they were tweaked.

Nemotron-3-Nano-30B-A3B and Nemotron-3-Nano-4B
- It would repeat the same code a lot, getting nowhere
- Does this despite it seeing that it already did the exact same thing
- For example it would just loop listing what is in a directory, and on next run go "Yup. Better list that directory"
Nemotron-Cascade-2-30B-A3B
- Didnt work so well with my approach, it would sometimes respond with a tool call instead of generating code.
- Think this is just because the model was trained for something different.
Qwen3.5-27B and Qwen3.5-9B
- Has issues understanding JSON schema which I use in my prompts
- 27B is a little better than 9B
OmniCoder 9B
- This one did pretty good, but would take around 16-20 minutes to complete
- Also had issues with JSON schema
- Had lots of issues with it hitting error status 524 (llama.cpp) - this is a cache/memory issue as I understand it
- Tried using --swa-full with no luck
- Likely a skill issue with my llama.cpp - I barely set anything, just the model and quant
Jan-v3-4B-Instruct-base
- Good at following instructions
- But is kinda dumb, sometimes it would skip tasks (go from task 1 to 3)
- Didn't really use my save_output functions or even write to a file - would cause it to need to redo work it already did
LFM-2.5-1.2B
- Didn't work for my use case
- Doesn't generate the code, only the thought (eg. "I will now check what files are in the directory") and then stop
- Could be that it wanted to generate the code in the next turn, but I have the turn stopping text set in stopping strings

Next steps: better prompts

I might not have done each model justice, they all seem cool and I hear great things about them. So I am thinking of giving it another try.

To really dial it in for each model, I think I will start tailoring my prompts more to each model, and then do a rerun with them again. Since I can also adjust my parameters for each prompt template, that could help with some of the issues (for example the JSON schema - or get rid of schema).

But I wanted to hear if others had some tips, either on prompts or how to work with some of the other models (or new suggestions for small models!).

For anyone interested I have created a repo on sourcehut and pasted my prompts/config. This is just the config as it is at the time of uploading.

Prompts: https://git.sr.ht/~cultist_dev/llm_shenanigans/tree/main/item/2026-03-21-prompts.yaml

submitted by /u/mikkel1156
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/22DailyView insight →

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Dev.to

Interesting loop

Reddit r/LocalLLaMA

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

Reddit r/LocalLLaMA

Die besten AI Tools fuer Digital Nomads 2026

Dev.to

I Built the Most Feature-Complete MCP Server for Obsidian — Here's How

Dev.to

Small models can be good agents

Key Points

Next steps: better prompts

💡 Insights using this article

Related Articles

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Interesting loop

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

Die besten AI Tools fuer Digital Nomads 2026

I Built the Most Feature-Complete MCP Server for Obsidian — Here's How

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer