Visit the repo. 100% Open Source. Vibe coded PRs accepted! It's a wrapper of MLX with more advanced inference features. There are more models supported than the baseline Swift MLX. This is 100% swift. Not python required. You can install with PIP but that's the extent of it.
New in 0.9.7
https://github.com/scouzi1966/maclocal-api
pip install macafm or brew install scouzi1966/afm/afm
Telegram integration: Give it a bot ID and chat with your local model from anywhere with Telegram client. First phase is basic
Experimental tool parser: afm_adaptive_xml. The lower quant/B models are not the best at tool calling compliance to conform to the client schema.
--enable-prefix-caching: Enable radix tree prefix caching for KV cache reuse across requests
--enable-grammar-constraints: Enable EBNF grammar-constrained decoding for tool calls (requires --tool-call-parser afm_adaptive_xml).Forces valid XML tool call structure at generation time, preventing JSON-inside-XML and missing parameters. Integrates with xGrammar
--no-think:Disable thinking/reasoning. Useful for Qwen 3.5 that have some tendencies to overthink
--concurrent: Max concurrent requests (enables batch mode; 0 or 1 reverts to serial). For batch inference. Get more througput with parallel requests vs serialized requests
--guided-json: Force schema output
--vlm: Load multimode models as vlm. This allows user to bypass vlm for better pure text output. Text only is on by default
[link] [comments]




