Karpathy's MicroGPT running at 50,000 tps on an FPGA

Reddit r/LocalLLaMA / 5/3/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • Karpathy’s MicroGPT is reportedly running at around 50,000 tokens per second on an FPGA using a very small model with just 4,192 parameters.
  • The write-up emphasizes that much of the throughput comes from keeping model weights on the chip (onboard ROM) instead of fetching them from external memory.
  • The post notes a practical limitation: with current FPGAs and 16-bit weights, onboard ROM caps the model size at roughly 20–30 million parameters.
  • It suggests that future increases in onboard ROM capacity—or FPGAs specialized for small language models (SLMs)—could enable larger models to achieve similar high inference speeds.
  • Project details and the related repository are provided for readers to inspect and reproduce the approach.

Sure, it's only 4,192 parameters, but it's a start. Project write-up here: https://v2.talos.wtf/ and github repository here: https://github.com/Luthiraa/TALOS-V2

Some of the speed comes from having the weights onboard, rather than in external memory. Onboard ROM means with 16 bit weights current FPGAs max out at 20-30 million parameters, but maybe this and Taalas (https://taalas.com/ - similar names are unlikely a coincidence) will lead to more onboard ROM appearing in FPGAs or FPGAs dedicated to SLMs.

submitted by /u/jawondo
[link] [comments]