[P] Inferencing Llama3.2-1B-Instruct on 3xMac Minis M4 with Data Parallelism using allToall architecture! | smolcluster

Reddit r/MachineLearning / 3/22/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

Demonstrates data-parallel inference of Llama3.2-1B-Instruct across 3 Mac Minis (M4, 16 GB RAM each) using a self-built allToall architecture.
In this approach, every worker exchanges activations with every other worker and averages them before decoding, enabling data parallelism when the model cannot fit on a single device.
The architecture allows any worker to communicate directly with others, unlike a master-worker setup where communication is channeled through a server.
The setup uses 3 Mac Minis and Thunderbolt 4 cables, with the implementation and instructions available on GitHub.

Here's another sneak-peek into inference of Llama3.2-1B-Instruct model, on 3xMac Mini 16 gigs each M4 with smolcluster!

Today's the demo for my Data Parallelism implementation using allToall architecture, all written from scratch using only socket libraries for communications.

Data parallelism allows for data to be shared across many gpus but each gpu will have the full model on them. It's used when you have data not fitting on a single gpu.
I went for a allToall architecture where each worker is connected to every other worker. For inferencing, all the workers send their activations to each other and takes a simple arithmetic average of all the activations before decoding starts.
Well, that means, you can choose, any of the workers chat with them directly unlike in a master-worker node where you can only communicate with the server.

Thats it for the basic theory of DP for inferencing with allToall architecture!

Setup:

3xMac Minis 2025 M4 16 GB RAM each
Thunderbolt 4 cables

Github

submitted by /u/East-Muffin-6472
[link] [comments]

How UCP Breaks Your E-Commerce Tracking Stack: A Platform-by-Platform Analysis

Dev.to

AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?

Dev.to

[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly

Reddit r/MachineLearning

I Gave Claude Code a Memory — Here's How MCP Connects AI Tools to Your Knowledge Base

Dev.to

I Replaced My $4,800/Month VA With 3 AI Prompts. Here's the Exact Setup.

Dev.to

[P] Inferencing Llama3.2-1B-Instruct on 3xMac Minis M4 with Data Parallelism using allToall architecture! | smolcluster

Key Points

Related Articles

How UCP Breaks Your E-Commerce Tracking Stack: A Platform-by-Platform Analysis

AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?

[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly

I Gave Claude Code a Memory — Here's How MCP Connects AI Tools to Your Knowledge Base

I Replaced My $4,800/Month VA With 3 AI Prompts. Here's the Exact Setup.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer