Gemma 4 actually running usable on an Android phone (not llama.cpp)

Reddit r/artificial / 4/19/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

The post claims Gemma 4 can run as a usable local LLM on a real Android phone when using Google’s LiteRT setup, avoiding the sluggish performance seen with llama.cpp in Termux.
The author reports switching from llama.cpp (about 2–3 tokens/second and overheating) to LiteRT for smooth operation.
They also describe integrating the phone-based model into an agent workflow running in Termux.
The result is a single-phone setup that can run the LLM locally, automate apps via ADB, and optionally operate fully offline.
The author invites others to share additional ideas and to provide implementation details and code.

Gemma 4 actually running usable on an Android phone (not llama.cpp)

I wanted a real local assistant on my phone, not a demo.

First tried the usual llama.cpp in Termux — Gemma 4 was 2–3 tok/s and the phone was on fire. Then I switched to Google’s LiteRT setup, got Gemma 4 running smoothly, and wired it into an agent stack running in Termux.

Now one Android phone is: