Qwen3.6-35B-A3B-UD-IQ4_XS C++ to Rust Code Port Test: It Worked (Mostly)!

Reddit r/LocalLLaMA / 4/25/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The author reports that Qwen3.6-35B-A3B-UD-IQ4_XS significantly outperformed earlier local Qwen3.5 models on their usual code/test tasks, showing faster debugging and more cloud-like reasoning quality.
They emphasize that the model is sparse, which they say improves runtime speed while maintaining higher-quality outputs compared with other local models they tested.
To stress-test the model beyond simple prompts, the author ported an existing C++ project (OddVoices/liboddvoices) to Rust, using it as a real codebase migration exercise.
The port produced by the model is described as largely successful, with only minor bugs remaining, suggesting strong practical capability for non-trivial C++→Rust translation.
The post overall suggests local LLMs are reaching a level where they can handle complex software engineering tasks that previously caused failures in older cloud-model attempts.

Qwen3.6-35B-A3B-UD-IQ4_XS C++ to Rust Code Port Test: It Worked (Mostly)!

When Qwen3.6-35B-A3B was released a week or so ago, I sort of expected an iterative improvement on the previous Qwen3.5 models. After all, those models were pretty decent as compared with the previous local models I had tried, and Qwen3.5 did well on the fairly boring ThreeJS task I've been using to test local models. Well, Qwen3.6 did what took that model several minutes of debugging in about a minute. I quickly realized this thing was somehow way smarter than the previous model, and in fact, it was more comparable to the cloud models I've been trying than to any of the previous local models I've tried. Gemma 4 comes the closest, but even it seemed to have less insightful planning and generally a higher rate of errors on the tasks I use LLMs for, as compared with this Qwen model. That's also ignoring the fact that this is a sparse model, which means it runs several times faster while producing, in my opinion, significantly higher-quality output. I also tried having it explore and summarize several complex codebases, and in only a minute or two, it would return with a detailed report of what I was asking. I was getting the feeling that my shitty snake test wouldn't really cut it for this model, so I thought I would try porting a C++ project I really quite like to Rust.

I've wanted to package OddVoices into something more user-friendly for a while now. For context, it's basically an obscure open-source alternative to Vocaloid or UTAU. I've recently experimented with writing VST3 audio plugins with Rust, and with NIH-Plug and egui, it's almost trivial. Anyway, I figured I could get a head start with either turning OddVoices into a plugin or some other sort of graphical program by porting the liboddvoices code to Rust. Even better, it would be the perfect test of this new Qwen model. Honestly, some cloud models from the not-too-distant past would have failed miserably at something like this, so I hope it's obvious how wild it is that a local model could even attempt it.

Well, attempt it did, and I'm happy to share that it was a success (mostly). The port it created has a few minor bugs that slightly affect speed and cause issues with certain sounds, resulting in occasional peaks, but it sounds virtually identical to the original code. I manually tested the output as it was working on the code and directed it to reference the C++ implementation when certain aspects of the sound weren't working properly. Not only did it use my vague direction to find the right bits of code to reference, but it also recognized when its own implementation was at fault and updated it based on what it learned from the original code. Of course, that's the whole point of porting code: to copy the original implementation, but even larger LLMs tend to gloss over specifics until they rear their head in testing. Still, this tiny model is virtually identical to a much larger cloud model. If you told me this was a new revision of Haiku, I would probably believe you.

Waveform of output from the current Rust port vs. the original C++ liboddvoices engine.

Of course, those issues I mentioned can be fixed with further testing, but I wanted you to hear what it accomplished in about 5-ish hours over 2 nights of total development time. This model feels like what Stable Diffusion 1.4 was to Dall-E 2: local can be as good, and in some cases better, than big cloud models. I've been using an all-local workflow for the past week or so, thanks mostly to this model, and I haven't noticed any major difference between it and much larger models.

So, case in point: this shit is incredible. I never would have thought anything this good would run at this speed on my computer. If you haven't had a chance to look at this model and Gemma 4, please check both out. As people have been saying, Gemma 4 is a better all-around model for conversational tasks, and Qwen3.6 is exceptional for agentic coding.

Since people often ask what I use, and since I thought I would actually try to get everything I need running locally, here's what I've been using:

Backend: Ooba's TextGen - A decent portable wrapper for Llama.cpp and several other popular backends. I think it has better UX than many other hosting options, and the chat interface it ships with is pretty decent, too.

Agent/Editor: Cline + VSCodium - VSCodium strips all of the Microslop BS out while Cline gives you a capable FOSS agent extension that easily hooks up to both local and cloud models.

MCP: Grounded Docs MCP + Granite-Embedding-278m-multilingual + KoboldCpp - This is the most elegant local replacement I could come up with for Context7. You could use Ooba for this as well, but KoboldCpp is much easier to script with, as it's a single binary file. You can run the embedding model on a GPU for indexing documentation much faster, and then run it on a CPU for regular queries when your GPU is occupied with your main model. This is essentially a classic RAG system with a built-in web scraper. This makes working on complex projects with many external dependencies much more bearable, especially for small models with limited world knowledge. Context7 is the hassle-free cloud alternative, but is it really local if a critical pillar of your development setup is a proprietary cloud service?

If this is a new baseline for local models at this size, I'm pretty stoked to see what future models are capable of. I still feel like I haven't really reached the limits of what this thing can do, which I've never really felt before with other models.

submitted by /u/EuphoricPenguin22
[link] [comments]