| Hey everybody, I have a strong interest in offloading work to small, specialized models that I can parallelize - this lets me scale work significantly (plus, I am less dependent on proprietary APIs) Some time ago, I saw a blog post from Wiz about fine-tuning Llama 3.2-1B for secret detection in code. They got 86% Precision and 82% Recall. I wanted to see if I can replicate (or beat) those numbers using purely local AI and produce a local specialized model. After a couple of weekends of trying it out I managed to get a Llama 3.2-1B hitting 88% Precision and 84.4% Recall simultaneously! I also benchmarked Qwen 3.5-2B and 4B - expectedly, they outperformed Llama 1B at the cost of more VRAM and longer inference time. I’ve put together a full write-up with the training stats, examples, and a step-by-step breakdown of what I went through to hit these metrics. Warning: It's technical and pretty long, but I honestly think it's fun to read. Here are some highlights:
Would love to hear if anyone else is pursuing efficient 1B/3B finetunes for specialized tasks and about your stack!
[link] [comments] |
A technical, 100% local writeup on how I replicated and then surpassed the Secret Detection model from Wiz (and the challenges along the way) - including labeling an entire dataset with local AI
Reddit r/LocalLLaMA / 4/6/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The author describes a fully local attempt to replicate and surpass Wiz’s reported Llama 3.2-1B “secret detection” fine-tuning results, improving to 88% precision and 84.4% recall after several weekends of experimentation.
- They benchmarked alternative small language models (including Qwen 3.5 2B and 4B), noting that higher-performing models required more VRAM and incurred longer inference times.
- Publicly sourced data was supplemented via procedural generation, and the dataset was labeled locally using Qwen3-Coder-Next; the project also involved training the models to output structured JSON.
- Initial schema/JSON compliance was effectively zero for baseline SLMs, but training improved it to 98–100% compliance, enabling reliable structured predictions.
- The work uncovered data quality pitfalls (e.g., an “embarrassing” high-entropy class and misclassified negatives that included real-world passwords), and correcting these issues improved recall for passwords.




