Squeeze even more performance on MLX

Reddit r/LocalLLaMA / 3/19/2026

📰 NewsTools & Practical Usage

共有:

Key Points

AFM MLX has been optimized for MacOS to squeeze more performance and is a 100% native Swift and open-source solution.
It can be installed using Homebrew (brew install scouzi1966/afm/afm) or via pip (pip install macafm).
The update enables batch mode with concurrent connections to support multi-agent work across different contexts, boosting throughput.
A new --enable-prefix-cache flag helps avoid recomputing the entire context in multiturn conversations, saving GPU resources.
The post includes a visual comparison between AFM and Python MLX and links to the GitHub repository for more details.

AFM MLX has been optimized to squeeze even more performance on MacOs than the Python version. It's a 100% native swift and 100% open source.

https://github.com/scouzi1966/maclocal-api

To install:

brew install scouzi1966/afm/afm

pip install macafm

To see all features:

afm mlx -h

Batch mode. With concurrent connections, you can get a lot more tokens generated usig multiple connections. This is suitable for multi-agent work with different contexts.

AFM vs Python MLX

It also has a --enable-prefix-cache flag to avoid wasting GPU resources recalulating the entire context in multiturn conversations with agents.

https://preview.redd.it/r26otzqvnzpg1.png?width=2940&format=png&auto=webp&s=b5540f2583b8bf9a78fe451cb83ace2558695ceb

submitted by /u/scousi
[link] [comments]