SpruceChat runs Qwen2.5-0.5B on handheld gaming devices using llama.cpp. no cloud, no wifi needed. the model lives in RAM after first boot and tokens stream in one by one.
runs on: Miyoo A30, Miyoo Flip, Trimui Brick, Trimui Smart Pro
performance on the A30 (Cortex-A7, quad-core): - model load: ~60s first boot - generation: ~1-2 tokens/sec - prompt eval: ~3 tokens/sec
it's not fast but it streams so you watch it think. 64-bit devices are quicker.
the AI has the personality of a spruce tree. patient, unhurried, quietly amazed by everything.
if the device is on wifi you can also hit the llama-server from a browser on your phone/laptop and chat that way with a real keyboard.
repo: https://github.com/RED-BASE/SpruceChat
built with help from Claude. got a collaborator already working on expanding device support. first release is up with both armhf and aarch64 binaries + the model included.
[link] [comments]



