| I wanted a real local assistant on my phone, not a demo. First tried the usual llama.cpp in Termux — Gemma 4 was 2–3 tok/s and the phone was on fire. Then I switched to Google’s LiteRT setup, got Gemma 4 running smoothly, and wired it into an agent stack running in Termux. Now one Android phone is:
Happy to share details + code and hear what else you’d build on top of this. [link] [comments] |
Gemma 4 actually running usable on an Android phone (not llama.cpp)
Reddit r/artificial / 4/19/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage
Key Points
- The post claims Gemma 4 can run as a usable local LLM on a real Android phone when using Google’s LiteRT setup, avoiding the sluggish performance seen with llama.cpp in Termux.
- The author reports switching from llama.cpp (about 2–3 tokens/second and overheating) to LiteRT for smooth operation.
- They also describe integrating the phone-based model into an agent workflow running in Termux.
- The result is a single-phone setup that can run the LLM locally, automate apps via ADB, and optionally operate fully offline.
- The author invites others to share additional ideas and to provide implementation details and code.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.




