[Model Release] I trained a 9B model to be agentic Data Analyst (Qwen3.5-9B + LoRA). Base model failed 100%, this LoRA completes 89% of workflows without human intervention.

Reddit r/LocalLLaMA / 4/10/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A developer released a new agentic data-analysis LoRA for Qwen3.5-9B (built on agentscope-ai/CoPaw-Flash-9B) claiming the base model could not complete open-ended analysis tasks at all.
  • The LoRA is trained on large multi-step trace datasets (finance/education/sports) to perform end-to-end loops including planning, executing and debugging Python, visualizing, and summarizing results.
  • In tests across 29 Kaggle datasets (custom harness, max_turns=50, 128K context), the base model reportedly achieved 0% usable completion, while the LoRA reached about a 89.7% “natural completion rate” without human intervention.
  • The post provides practical deployment guidance, stating approximate VRAM requirements for inference using vLLM (about 22GB in bf16 on a single GPU, ~12GB at 8-bit, ~6GB at 4-bit) plus links to the LoRA weights and an accompanying inference/tool-calling framework.
[Model Release] I trained a 9B model to be agentic Data Analyst (Qwen3.5-9B + LoRA). Base model failed 100%, this LoRA completes 89% of workflows without human intervention.

Hey r/LocalLLaMA,

Most of us know the struggle with local "Agentic" models. Even good ones at the 4B-14B scale are usually just glorified tool-callers. If you give them an open-ended prompt like "Analyze this dataset and give me insights," they do one step, stop, and wait for you to prompt them to "continue."

I wanted to see if a small <10B model could achieve true autonomy through weights, rather than relying on massive external prompting frameworks.

What I built: I took agentscope-ai/CoPaw-Flash-9B (which is based on the Qwen3.5-9B architecture) and trained a LoRA specifically for end-to-end data analysis workflows.

The Secret Sauce (Training Data): Instead of standard instruction tuning, I constructed massive, multi-step trace datasets covering real-world scenarios (finance, education, sports data). The LoRA was trained not just to call tools, but to plan, execute, debug Python code, visualize, and summarize in a continuous loop until the job is done.

The Results (See Benchmark Image2): I tested it on 29 real Kaggle datasets using a custom framework (max_turns=50, context=128K).

  • Base Model: Averages 1.2 iterations and stops. 0% completion rate. Produces zero usable output.
  • With My LoRA: Averages 26 autonomous iterations. Writes Python, plots charts, and achieves an 89.7% natural completion rate with ZERO human intervention.

It basically turns a 9B model into a junior data analyst you can run locally on 12GB-24GB VRAM.

VRAM Requirements (vLLM):

  • bf16 (Single GPU): ~22GB
  • 8-bit: ~12GB
  • 4-bit: ~6GB

Links:

⚠️ A Call to the Community (Looking for Compute/Sponsorship):

This one-week experiment proved something important: Small models CAN be fully autonomous agents if trained on scenario-based workflows.

Data analysis is just the beginning. I want to apply this methodology to build local, truly autonomous agents for Coding (Software Engineers), Research Assistants, and more.

However, I am currently bottlenecked by hardware and funding. Training these continuous-workflow datasets takes significant juice, and I want to scale this to create state-of-the-art open agents.

If anyone here has access to compute grants, GPU clusters they are willing to sponsor, or if there are organizations/backers interested in funding the development of open-source local agents, please reach out to me via DM.

Let's build local agents that actually do the work for us. Happy to answer any questions about the training process, data generation, or deployment in the comments!

submitted by /u/Awkward_Run_9982
[link] [comments]