| AFM MLX has been optimized to squeeze even more performance on MacOs than the Python version. It's a 100% native swift and 100% open source. https://github.com/scouzi1966/maclocal-api To install: or To see all features: Batch mode. With concurrent connections, you can get a lot more tokens generated usig multiple connections. This is suitable for multi-agent work with different contexts. It also has a --enable-prefix-cache flag to avoid wasting GPU resources recalulating the entire context in multiturn conversations with agents. [link] [comments] |
Squeeze even more performance on MLX
Reddit r/LocalLLaMA / 3/19/2026
📰 NewsTools & Practical Usage
Key Points
- AFM MLX has been optimized for MacOS to squeeze more performance and is a 100% native Swift and open-source solution.
- It can be installed using Homebrew (brew install scouzi1966/afm/afm) or via pip (pip install macafm).
- The update enables batch mode with concurrent connections to support multi-agent work across different contexts, boosting throughput.
- A new --enable-prefix-cache flag helps avoid recomputing the entire context in multiturn conversations, saving GPU resources.
- The post includes a visual comparison between AFM and Python MLX and links to the GitHub repository for more details.
Related Articles

Astral to Join OpenAI
Dev.to

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic
Dev.to

Your AI coding agent is installing vulnerable packages. I built the fix.
Dev.to

ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA