Qwen 3.5 "Weight Drift" Fix? Automated Tool + Inconclusive NIAH Results

Reddit r/LocalLLaMA / 4/12/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

A community member proposes an open-source “weight drift” repair approach for Qwen 3.5 by scaling specific ssm_conv1d.weight tensors, originally reported to reduce errors substantially.
The author created an automated detection-and-repair tool (using Median Absolute Deviation Z-scores) to standardize the fix, but their initial Needle-in-a-Haystack (125k context) tests show no performance difference between the original BF16 and repaired model.
The author notes the reported “context melt-down” was not observed, suggesting the fix may target a narrower failure mode (e.g., logic/code-generation issues) that NIAH does not measure.
They are requesting broader verification via other benchmarks (PPL, HumanEval, EQ-Bench) and help auditing the repair/math and script logic.
The post frames the effort as a call for collaboration to confirm findings and refine the utility into a reliable community tool.

Qwen 3.5 "Weight Drift" Fix? Automated Tool + Inconclusive NIAH Results

The Context

I’ve been following this thread for Qwen 3.5 by u/EvilEnginer, claiming a 90% error reduction by scaling specific ssm_conv1d.weight tensors.

My Testing

I’m interested in seeing if we can confirm their results and make this fix a standard, transparent utility for the community. Based on the findings shared by u/EvilEnginer regarding tensor scales in the final blocks, I’ve written an independent tool to automate the detection and repair of this drift. However, my initial testing is inconclusive:

- NIAH (Needle In A Haystack) @ 125k context: Both the original BF16 and my repaired version passed with identical scores.

I didn't see the context "melt-down" described in the original thread, which suggests this fix might target a more specific failure mode (like logic loops or code generation) that NIAH doesn't catch.

The Tool & Call for Collaboration

I’ve automated the detection (using Median Absolute Deviation Z-scores) and the repair logic. I’d love to see if the community can help confirm u/EvilEnginer’s findings and help refine this so we have a reliable, open-source way to apply these repairs.

As I don’t have the horsepower I am hoping we can do some:

Before/After Benchmarking: If you have the setup for PPL, HumanEval, or EQ-Bench, can you verify a delta between the original and repaired versions?
Logic/Script Checking: Quite frankly this is approaching the limits of my knowledge. Is my math missing something? Is my script not handling something correctly?

submitted by /u/Decivox
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

AI Agents Explained: 5 Types, Components, Frameworks, and Real-World Use Cases

Dev.to

Build Your Own JARVIS: A Deep Dive into Memo AI - The Privacy-First Local Voice Agent

Dev.to

Edge-to-Cloud Swarm Coordination for circular manufacturing supply chains with embodied agent feedback loops

Dev.to

Qwen 3.5 "Weight Drift" Fix? Automated Tool + Inconclusive NIAH Results

Key Points

Related Articles

Black Hat USA

Black Hat Asia

AI Agents Explained: 5 Types, Components, Frameworks, and Real-World Use Cases

Build Your Own JARVIS: A Deep Dive into Memo AI - The Privacy-First Local Voice Agent

Edge-to-Cloud Swarm Coordination for circular manufacturing supply chains with embodied agent feedback loops

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer