afm mlx on MacOs - new Version released! Great new features (MacOS)

Reddit r/LocalLLaMA / 3/18/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

AFM MLX on MacOS releases version 0.9.7 as a 100% Swift wrapper around MLX with more advanced inference features and no Python required.
The release adds support for more models than the baseline Swift MLX, broadening model availability on macOS.
Installation is straightforward via pip (pip install macafm) or Homebrew (brew install scouzi1966/afm/afm).
Telegram integration lets users chat with a local model through a Telegram bot, enabling remote interaction.
It ships an experimental tool parser (afm_adaptive_xml) and runtime options such as --enable-prefix-caching, --enable-grammar-constraints, --no-think, --concurrent, --guided-json, and --vlm for multimode models, with notes on compatibility and defaults.

Visit the repo. 100% Open Source. Vibe coded PRs accepted! It's a wrapper of MLX with more advanced inference features. There are more models supported than the baseline Swift MLX. This is 100% swift. Not python required. You can install with PIP but that's the extent of it.

New in 0.9.7
https://github.com/scouzi1966/maclocal-api

pip install macafm or brew install scouzi1966/afm/afm

Telegram integration: Give it a bot ID and chat with your local model from anywhere with Telegram client. First phase is basic

Experimental tool parser: afm_adaptive_xml. The lower quant/B models are not the best at tool calling compliance to conform to the client schema.

--enable-prefix-caching: Enable radix tree prefix caching for KV cache reuse across requests

--enable-grammar-constraints: Enable EBNF grammar-constrained decoding for tool calls (requires --tool-call-parser afm_adaptive_xml).Forces valid XML tool call structure at generation time, preventing JSON-inside-XML and missing parameters. Integrates with xGrammar

--no-think:Disable thinking/reasoning. Useful for Qwen 3.5 that have some tendencies to overthink

--concurrent: Max concurrent requests (enables batch mode; 0 or 1 reverts to serial). For batch inference. Get more througput with parallel requests vs serialized requests

--guided-json: Force schema output

--vlm: Load multimode models as vlm. This allows user to bypass vlm for better pure text output. Text only is on by default

submitted by /u/scousi
[link] [comments]

Manus、AIエージェントをデスクトップ化ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像

Ledge.ai

The programming passion is melting

Dev.to

Best AI Tools for Property Managers in 2026

Dev.to

Building “The Sentinel” – AI Parametric Insurance at Guidewire DEVTrails

Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Dev.to

afm mlx on MacOs - new Version released! Great new features (MacOS)

Key Points

Related Articles

Manus、AIエージェントをデスクトップ化ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像

The programming passion is melting

Best AI Tools for Property Managers in 2026

Building “The Sentinel” – AI Parametric Insurance at Guidewire DEVTrails

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Related Articles

Manus、AIエージェントをデスクトップ化 ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像

The programming passion is melting

Best AI Tools for Property Managers in 2026

Building “The Sentinel” – AI Parametric Insurance at Guidewire DEVTrails

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Manus、AIエージェントをデスクトップ化ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像