Path-Lock Expert: Separating Reasoning Mode in Hybrid Thinking via Architecture-Level Separation
arXiv cs.CL / 5/1/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Hybrid-thinking language models with think/no-think modes still suffer from “reasoning leakage,” because both modes are effectively encoded in the same feed-forward parameters.
- The paper proposes Path-Lock Expert (PLE), which replaces each decoder-layer MLP with two mode-specific “experts” (think vs. no-think) while keeping attention, embeddings, normalization, and the LM head shared.
- A deterministic control-token router selects exactly one expert path for the entire sequence, ensuring mode-pure updates during supervised fine-tuning and preserving the dense model’s computation pattern.
- Experiments on math and science reasoning benchmarks show PLE keeps strong think performance and significantly improves the no-think mode’s accuracy and conciseness while reducing leakage.
- Reported results on Qwen3-4B (e.g., AIME24) reduce no-think reflective tokens from 2.54 to 0.39 and raise no-think accuracy from 20.67% to 40.00% without degrading think-mode performance.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Why Enterprise AI Pilots Fail
Dev.to

The PDF Feature Nobody Asked For (That I Use Every Day)
Dev.to

How to Fix OpenClaw Tool Calling Issues
Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model
THE DECODER