AI Navigate

Running TinyLlama 1.1B locally on a PowerBook G4 from 2002. Mac OS 9, no internet, installed from a CD.

Reddit r/LocalLLaMA / 3/20/2026

📰 NewsTools & Practical Usage

Key Points

  • MacinAI Local is a complete local AI inference platform that runs natively on classic Macintosh hardware with Mac OS 9 and no internet access required.
  • It is model-agnostic and supports GPT-2 (124M), TinyLlama, Qwen (0.5B), SmolLM, and other HuggingFace/LLaMA-architecture models via a Python export script.
  • The project uses a custom C89 inference engine, a 100M parameter Macintosh-specific transformer, and AltiVec SIMD optimizations that deliver about a 7.3x speedup on PowerPC G4, achieving 0.33 seconds per token with Q8 quantization.
  • Disk paging enables running inference on machines with limited RAM by streaming layers from disk, demonstrated on a 1GB RAM PowerBook G4.
  • Agentic Mac control allows the model to generate AppleScript for launching apps, managing files, and automating system tasks, with a safety confirmation before execution.
Running TinyLlama 1.1B locally on a PowerBook G4 from 2002. Mac OS 9, no internet, installed from a CD.

Hey everyone! I've been working on this for months and today's the day. MacinAI Local is a complete local AI inference platform that runs natively on classic Macintosh hardware, no internet required.

What makes this different from previous retro AI projects:

Every "AI on old hardware" project I've seen (llama98.c on Windows 98, llama2.c64 on Commodore 64, llama2 on DOS) ports Karpathy's llama2.c with a single tiny 260K-parameter model. MacinAI Local is a ground-up platform:

  • Custom C89 inference engine: not a port of llama.cpp or llama2.c. Written from scratch targeting Mac Toolbox APIs and classic Mac OS memory management.
  • Model-agnostic: runs GPT-2 (124M), TinyLlama, Qwen (0.5B), SmolLM, and any HuggingFace/LLaMA-architecture model via a Python export script. Not locked to one toy model.
  • 100M parameter custom transformer: trained on 1.1GB of Macintosh-specific text (Inside Macintosh, MacWorld, Usenet archives, programming references).
  • AltiVec SIMD optimization: 7.3x speedup on PowerPC G4. Went from 2.4 sec/token (scalar) down to 0.33 sec/token with Q8 quantization and 4-wide unrolled vector math with cache prefetch.
  • Agentic Mac control: the model generates AppleScript to launch apps, manage files, open control panels, and automate system tasks. It asks for confirmation before executing anything.
  • Disk paging: layers that don't fit in RAM get paged from disk, so even machines with limited memory can run inference. TinyLlama 1.1B runs on a machine with 1GB RAM by streaming layers from the hard drive.
  • Speech Manager integration: the Mac speaks every response aloud using PlainTalk voices.
  • BPE tokenizer: 8,205 tokens including special command tokens for system actions.

The demo hardware:

PowerBook G4 Titanium (2002), 1GHz G4, 1GB RAM, running Mac OS 9.2.2.

Real hardware performance (PowerBook G4 1GHz, Mac OS 9.2, all Q8):

Model Params Q8 Size Tokens/sec Per token Notes
MacinAI Tool v7 94M 107 MB 2.66 tok/s 0.38s Custom tool model, AppleScript
GPT-2 124M 141 MB 1.45 tok/s 0.69s Text completion
SmolLM 360M 360M 394 MB 0.85 tok/s 1.18s Chat model
Qwen 2.5 0.5B 494M 532 MB 0.63 tok/s 1.59s Best quality
TinyLlama 1.1B 1.1B 1.18 GB 0.10 tok/s 9.93s Disk paging (24.5 min for 113 tok)

Technical specs:

Details
Language C89 (CodeWarrior Pro 5)
Target OS System 7.5.3 through Mac OS 9.2.2
Target CPUs 68000, 68030, 68040, PowerPC G3, G4
Quantization Float32, Q8_0 (int8 per-group)
Architectures LLaMA-family (RMSNorm/SwiGLU/RoPE) + GPT-2 family (LayerNorm/GeLU/learned pos)
Arena allocator Single contiguous block, 88% of physical RAM, no fragmentation
AltiVec speedup 7.3x over scalar baseline

What's next:

Getting the 68040 build running on a 1993 LC 575 / Color Classic Mystic. The architecture already supports it, just need the hardware in hand.

Demo: https://youtu.be/W0kV_CCzTAM

Technical write-up: https://oldapplestuff.com/blog/MacinAI-Local/

Happy to answer any technical questions. I've got docs on the AltiVec optimization journey (finding a CodeWarrior compiler bug along the way), the training pipeline, and the model export process.

Thanks for the read!

submitted by /u/SDogAlex
[link] [comments]