I abliterated Sarvam-30B and 105B - India's first multilingual MoE reasoning models - and found something interesting along the way!
Reasoning models have 2 refusal circuits, not one. The <think> block and the final answer can disagree: the model reasons toward compliance in its CoT and then refuses anyway in the response.
Killer finding: one English-computed direction removed refusal in most of the other supported languages (Malayalam, Hindi, Kannada among few). Refusal is pre-linguistic.
30B model: https://huggingface.co/aoxo/sarvam-30b-uncensored
105B model: https://huggingface.co/aoxo/sarvam-105b-uncensored
[link] [comments]



