Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp

Reddit r/LocalLLaMA / 4/2/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The author reports successfully running SmolLM2-360M on a Samsung Galaxy Watch 4 Classic with ~380MB available RAM by modifying llama.cpp/ggml memory handling.
They identify the main bottleneck as the model being effectively loaded twice (once via the APK mmap page cache and again via ggml tensor allocations), causing peak RAM to rise to ~524MB.
By passing host_ptr into llama_model_params, CPU tensors directly reference the mmap region while only Vulkan tensors are copied, avoiding redundant RAM usage.
On the watch, the change reduces peak RAM from ~524MB to ~142MB (about a 74% reduction) and improves boot time (19s → 11s, with ~2.5s on subsequent boots after mmap/KV cache warm-up).
The author shares code and proposes a PR to ggml-org/llama.cpp, seeking feedback on the host_ptr/mmap approach for embedded deployment.

I’ve got SmolLM2‑360M running on a Samsung Galaxy Watch 4 Classic (about 380MB free RAM) by tweaking llama.cpp and the underlying ggml memory model. By default, the model was being loaded twice in RAM: once via the APK’s mmap page cache and again via ggml’s tensor allocations, peaking at 524MB for a 270MB model.

The fix: I pass host_ptr into llama_model_params, so CPU tensors point directly into the mmap region and only Vulkan tensors are copied. On real hardware this gives:

Peak RAM: 524MB → 142MB (74% reduction)
First boot: 19s → 11s
Second boot: ~2.5s (mmap + KV cache warm)

Code:
https://github.com/Perinban/llama.cpp/tree/axon‑dev

Longer write‑up with VmRSS traces and design notes:
https://www.linkedin.com/posts/perinban-parameshwaran_machinelearning-llm-embeddedai-activity-7445374117987373056-xDj9?utm_source=share&utm_medium=member_desktop&rcm=ACoAAA1J2KoBHgKFnrEIUchmbOoZTpAqKKxKK7o

I’m planning a PR to ggml‑org/llama.cpp; feedback on the host‑ptr / mmap pattern is welcome.

submitted by /u/RecognitionFlat1470
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

Unitree's IPO

ChinaTalk

Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖

Dev.to

I Built a Local-First AI Knowledge Base for Developers — Here's What Makes It Different

Dev.to

Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp

Key Points

Related Articles

Black Hat USA

Black Hat Asia

Unitree's IPO

Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖

I Built a Local-First AI Knowledge Base for Developers — Here's What Makes It Different

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer