mlx-tune – fine-tune LLMs on your Mac (SFT, DPO, GRPO, Vision) with an Unsloth-compatible API

Reddit r/LocalLLaMA / 3/17/2026

💬 OpinionTools & Practical UsageModels & Research

Read original →

共有:

Key Points

mlx-tune is an open-source library that lets you fine-tune LLMs natively on Apple Silicon using MLX, while exposing an Unsloth-compatible API so training scripts run on Mac or CUDA with a simple import switch.
It supports SFT with native MLX training (LoRA/QLoRA) and a range of advanced fine-tuning methods including DPO, ORPO, GRPO, KTO, and SimPO, plus vision-model fine-tuning (Qwen3.5 VLM) and chat templates for about 15 models.
You can export trained models to HuggingFace format or GGUF for Ollama/llama.cpp, and it runs on 8GB+ RAM (16GB+ recommended) for 1B 4-bit models.
This is meant for local development and prototyping rather than replacing Unsloth; the idea is to iterate on Mac and then push to CUDA for the real training run.
It’s a solo project with honest limitations (GGUF export from quantized bases not supported, RL trainers process one sample at a time), with GitHub/docs/PyPI links and an invitation for feedback, especially from Mac users.

mlx-tune – fine-tune LLMs on your Mac (SFT, DPO, GRPO, Vision) with an Unsloth-compatible API

Hello everyone,

I've been working on mlx-tune, an open-source library for fine-tuning LLMs natively on Apple Silicon using MLX.

I built this because I use Unsloth daily on cloud GPUs, but wanted to prototype training runs locally on my Mac before spending on GPU time. Since Unsloth depends on Triton (no Mac support, yet), I wrapped Apple's MLX framework in an Unsloth-compatible API — so the same training script works on both Mac and CUDA, just change the import line.

What it supports right now:

SFT with native MLX training (LoRA/QLoRA)
DPO, ORPO, GRPO, KTO, SimPO — all with proper loss implementations
Vision model fine-tuning — Qwen3.5 VLM training with LoRA
Chat templates for 15 models (Llama 3, Gemma, Qwen, Phi, Mistral, DeepSeek, etc.)
Response-only training via train_on_responses_only()
Export to HuggingFace format, GGUF for Ollama/llama.cpp
Works on 8GB+ unified RAM (1B 4-bit models), 16GB+ recommended

# Just swap the import from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig # ... rest of your Unsloth code works as-is

Some context: this was previously called unsloth-mlx, but I renamed it to mlx-tune to avoid confusion with the official Unsloth project. Same library, same vision — just a clearer name.

What it's NOT: a replacement for Unsloth. Unsloth with custom Triton kernels is faster on NVIDIA hardware. This is for the local dev loop — experiment on your Mac, get your pipeline working, then push to CUDA for the real training run.

Honest limitations:

GGUF export doesn't work from quantized base models (mlx-lm upstream limitation)
RL trainers process one sample at a time currently
It's a solo project, so feedback and bug reports genuinely help

GitHub: https://github.com/ARahim3/mlx-tune
Docs: https://arahim3.github.io/mlx-tune/
PyPI: pip install mlx-tune

Would love feedback, especially from folks fine-tuning on M1/M2/M3/M4/M5.

submitted by /u/A-Rahim
[link] [comments]