I technically got an LLM running locally on a 1998 iMac G3 with 32 MB of RAM

Reddit r/LocalLLaMA / 4/6/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • A Reddit post describes successfully running an LLM locally on an unmodified 1998 iMac G3 (233 MHz, PowerPC 750) with just 32 MB of RAM using Mac OS 8.5.
  • The implementation uses Andrej Karpathy’s “260K TinyStories” (~1 MB checkpoint) based on the Llama 2 architecture, with the prompt read from prompt.txt and the generated continuation written to output.txt.
  • Because of classic Mac OS constraints, the author uses Retro68 cross-compilation (PEF binaries), fixes model/tokenizer endianness for PowerPC, and adjusts Mac memory allocation to enlarge the app heap.
  • Several technical blockers are solved for the tiny target hardware, including disabling RetroConsole crash-prone output, correcting grouped-query attention tensor sizing (n_kv_heads vs n_heads) to prevent NaNs, and avoiding malloc by using static buffers for KV cache and run state.
  • The project is presented as a fun demo with very short outputs, with the code published in a GitHub repo for others to experiment with.
I technically got an LLM running locally on a 1998 iMac G3 with 32 MB of RAM

Hardware:

• Stock iMac G3 Rev B (October 1998). 233 MHz PowerPC 750, 32 MB RAM, Mac OS 8.5. No upgrades.

• Model: Andrej Karpathy’s 260K TinyStories (Llama 2 architecture). ~1 MB checkpoint.

Toolchain:

• Cross-compiled from a Mac mini using Retro68 (GCC for classic Mac OS → PEF binaries)

• Endian-swapped model + tokenizer from little-endian to big-endian for PowerPC

• Files transferred via FTP to the iMac over Ethernet

Challenges:

• Mac OS 8.5 gives apps a tiny memory partition by default. Had to use MaxApplZone() + NewPtr() from the Mac Memory Manager to get enough heap

• RetroConsole crashes on this hardware, so all output writes to a text file you open in SimpleText

• The original llama2.c weight layout assumes n_kv_heads == n_heads. The 260K model uses grouped-query attention (kv_heads=4, heads=8), which shifted every pointer after wk and produced NaN. Fixed by using n_kv_heads * head_size for wk/wv sizing

• Static buffers for the KV cache and run state to avoid malloc failures on 32 MB

It reads a prompt from prompt.txt, tokenizes with BPE, runs inference, and writes the continuation to output.txt.

Obviously the output is very short, but this is definitely meant to just be a fun experiment/demo!

Here’s the repo link: https://github.com/maddiedreese/imac-llm

submitted by /u/maddiedreese
[link] [comments]