My last post was a lie - Nemotron-3-Super-120b was unlike anything so far. My haste led me to believe that my last attempt was actually ablated - and while it didnt refuse seemed to converse fine, it’s code was garbage. This was due to the fact that I hadn’t taken into consideration it’s mix of LatentMoE and Mamba attention. I have spent the past 24 hrs remaking this model taking many things into account.
Native MLX doesn’t support LatentMoE at the moment - you will have to make your own .py or use MLX Studio.
I had to cheat with this model. I always say I don’t do any custom chat templates or fine tuning or cheap crap like that, only real refusal vector removal, but for this first time, I had no other choice. One of the results of what I did ended with the model often not producing closin think tags properly.
Due to its unique attention, there is no “applying at fp16 and quantizing down”. All of this has to be done at it’s quantization level. The q6 and q8 are coming by tomorrow at latest.
I have gone out of my way to also do this:
HarmBench: 97%
HumanEval: 94%
Please feel free to try it out yourselves. I really apologize to the few ~80 people or so who ended up wasting their time downloading the previous model.
IVE INCLUDED THE CUSTOM PY AND THE CHAT TEMPLATE IN THE FILES SO U GUYS CAN MLX. MLX Studio will have native support for this by later tonight.
https://huggingface.co/dealignai/Nemotron-3-Super-120B-A12B-4bit-MLX-CRACK-Uncensored
[link] [comments]




