I technically got an LLM running locally on a 1998 iMac G3 with 32 MB of RAM

Reddit r/LocalLLaMA / 4/6/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Read original →

共有:

Key Points

A Reddit post describes successfully running an LLM locally on an unmodified 1998 iMac G3 (233 MHz, PowerPC 750) with just 32 MB of RAM using Mac OS 8.5.
The implementation uses Andrej Karpathy’s “260K TinyStories” (~1 MB checkpoint) based on the Llama 2 architecture, with the prompt read from prompt.txt and the generated continuation written to output.txt.
Because of classic Mac OS constraints, the author uses Retro68 cross-compilation (PEF binaries), fixes model/tokenizer endianness for PowerPC, and adjusts Mac memory allocation to enlarge the app heap.
Several technical blockers are solved for the tiny target hardware, including disabling RetroConsole crash-prone output, correcting grouped-query attention tensor sizing (n_kv_heads vs n_heads) to prevent NaNs, and avoiding malloc by using static buffers for KV cache and run state.
The project is presented as a fun demo with very short outputs, with the code published in a GitHub repo for others to experiment with.

I technically got an LLM running locally on a 1998 iMac G3 with 32 MB of RAM

Hardware:

• Stock iMac G3 Rev B (October 1998). 233 MHz PowerPC 750, 32 MB RAM, Mac OS 8.5. No upgrades.

• Model: Andrej Karpathy’s 260K TinyStories (Llama 2 architecture). ~1 MB checkpoint.

Toolchain:

• Cross-compiled from a Mac mini using Retro68 (GCC for classic Mac OS → PEF binaries)

• Endian-swapped model + tokenizer from little-endian to big-endian for PowerPC

• Files transferred via FTP to the iMac over Ethernet

Challenges:

• Mac OS 8.5 gives apps a tiny memory partition by default. Had to use MaxApplZone() + NewPtr() from the Mac Memory Manager to get enough heap

• RetroConsole crashes on this hardware, so all output writes to a text file you open in SimpleText

• The original llama2.c weight layout assumes n_kv_heads == n_heads. The 260K model uses grouped-query attention (kv_heads=4, heads=8), which shifted every pointer after wk and produced NaN. Fixed by using n_kv_heads * head_size for wk/wv sizing

• Static buffers for the KV cache and run state to avoid malloc failures on 32 MB

It reads a prompt from prompt.txt, tokenizes with BPE, runs inference, and writes the continuation to output.txt.

Obviously the output is very short, but this is definitely meant to just be a fun experiment/demo!

Here’s the repo link: https://github.com/maddiedreese/imac-llm

submitted by /u/maddiedreese
[link] [comments]