Running TinyLlama 1.1B locally on a PowerBook G4 from 2002. Mac OS 9, no internet, installed from a CD.

Reddit r/LocalLLaMA / 3/20/2026

📰 NewsTools & Practical Usage

共有:

Key Points

MacinAI Local is a complete local AI inference platform that runs natively on classic Macintosh hardware with Mac OS 9 and no internet access required.
It is model-agnostic and supports GPT-2 (124M), TinyLlama, Qwen (0.5B), SmolLM, and other HuggingFace/LLaMA-architecture models via a Python export script.
The project uses a custom C89 inference engine, a 100M parameter Macintosh-specific transformer, and AltiVec SIMD optimizations that deliver about a 7.3x speedup on PowerPC G4, achieving 0.33 seconds per token with Q8 quantization.
Disk paging enables running inference on machines with limited RAM by streaming layers from disk, demonstrated on a 1GB RAM PowerBook G4.
Agentic Mac control allows the model to generate AppleScript for launching apps, managing files, and automating system tasks, with a safety confirmation before execution.

Running TinyLlama 1.1B locally on a PowerBook G4 from 2002. Mac OS 9, no internet, installed from a CD.

Hey everyone! I've been working on this for months and today's the day. MacinAI Local is a complete local AI inference platform that runs natively on classic Macintosh hardware, no internet required.

What makes this different from previous retro AI projects:

Every "AI on old hardware" project I've seen (llama98.c on Windows 98, llama2.c64 on Commodore 64, llama2 on DOS) ports Karpathy's llama2.c with a single tiny 260K-parameter model. MacinAI Local is a ground-up platform:

Custom C89 inference engine: not a port of llama.cpp or llama2.c. Written from scratch targeting Mac Toolbox APIs and classic Mac OS memory management.
Model-agnostic: runs GPT-2 (124M), TinyLlama, Qwen (0.5B), SmolLM, and any HuggingFace/LLaMA-architecture model via a Python export script. Not locked to one toy model.
100M parameter custom transformer: trained on 1.1GB of Macintosh-specific text (Inside Macintosh, MacWorld, Usenet archives, programming references).
AltiVec SIMD optimization: 7.3x speedup on PowerPC G4. Went from 2.4 sec/token (scalar) down to 0.33 sec/token with Q8 quantization and 4-wide unrolled vector math with cache prefetch.
Agentic Mac control: the model generates AppleScript to launch apps, manage files, open control panels, and automate system tasks. It asks for confirmation before executing anything.
Disk paging: layers that don't fit in RAM get paged from disk, so even machines with limited memory can run inference. TinyLlama 1.1B runs on a machine with 1GB RAM by streaming layers from the hard drive.
Speech Manager integration: the Mac speaks every response aloud using PlainTalk voices.
BPE tokenizer: 8,205 tokens including special command tokens for system actions.

The demo hardware:

PowerBook G4 Titanium (2002), 1GHz G4, 1GB RAM, running Mac OS 9.2.2.

Real hardware performance (PowerBook G4 1GHz, Mac OS 9.2, all Q8):

Model	Params	Q8 Size	Tokens/sec	Per token	Notes
MacinAI Tool v7	94M	107 MB	2.66 tok/s	0.38s	Custom tool model, AppleScript
GPT-2	124M	141 MB	1.45 tok/s	0.69s	Text completion
SmolLM 360M	360M	394 MB	0.85 tok/s	1.18s	Chat model
Qwen 2.5 0.5B	494M	532 MB	0.63 tok/s	1.59s	Best quality
TinyLlama 1.1B	1.1B	1.18 GB	0.10 tok/s	9.93s	Disk paging (24.5 min for 113 tok)

Technical specs:

	Details
Language	C89 (CodeWarrior Pro 5)
Target OS	System 7.5.3 through Mac OS 9.2.2
Target CPUs	68000, 68030, 68040, PowerPC G3, G4
Quantization	Float32, Q8_0 (int8 per-group)
Architectures	LLaMA-family (RMSNorm/SwiGLU/RoPE) + GPT-2 family (LayerNorm/GeLU/learned pos)
Arena allocator	Single contiguous block, 88% of physical RAM, no fragmentation
AltiVec speedup	7.3x over scalar baseline

What's next:

Getting the 68040 build running on a 1993 LC 575 / Color Classic Mystic. The architecture already supports it, just need the hardware in hand.

Demo: https://youtu.be/W0kV_CCzTAM

Technical write-up: https://oldapplestuff.com/blog/MacinAI-Local/

Happy to answer any technical questions. I've got docs on the AltiVec optimization journey (finding a CodeWarrior compiler bug along the way), the training pipeline, and the model export process.

Thanks for the read!

submitted by /u/SDogAlex
[link] [comments]