Gemma 4 E2B runs surprisingly well on my 8GB Android phone, so I built a private voice notes app around it.

Reddit r/LocalLLaMA / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The author reports running Gemma 4 E2B locally on an 8GB Android phone and finding chat quality acceptable while being especially impressed by reliably structured, parseable JSON output.
Based on that behavior, they built a private Android voice-notes app that transcribes speech with Whisper Small and uses Gemma to split rambling notes into separate, tagged reminder items with resolved timing.
They describe end-to-end latency for a 10–15 second voice note as roughly 12–15 seconds total, with transcription around ~5 seconds and categorization/splitting around ~8–10 seconds, plus overhead for model loading, storage, and UI updates.
For searching, the app expands and rewrites user queries into keyword/hypothetical examples, merges multiple FTS retrieval lanes via reciprocal rank fusion, and optionally reranks top results with a reranker timeout.
The post invites others to share what local LLM models they run on phones and asks specifically whether categorization remains robust on real-world notes and how first-run behavior differs across devices.

Been running Gemma 4 E2B locally on my OnePlus CE 5 (8GB RAM) for a few months. Chat quality is fine for the size. What surprised me was JSON output. Short input, give it a structured prompt, you get clean parse able JSON back. Way better than I expected from a 2.4GB model on a phone.

Got me thinking about voice notes. You ramble for a few seconds, "call the dentist tomorrow at 3, also buy milk on the way home", and Gemma can split that into separate items, tag each one (reminder, buy), resolve the time. Tried it for a few weeks. Categorization is actually decent on real notes, not just the toy ones I started with.

Built an Android app around it. Whisper Small (244MB) for transcription via Sherpa-ONNX, Gemma 4 E2B (2.4GB) for the splitting and categorization via LiteRT-LM. Both run on the phone, no cloud, no account.

End-to-end on the CE 5, a typical 10-15 second voice note takes about 12-15s. Whisper does transcription in ~5s, Gemma categorizes in ~8-10s, rest is model load + Room writes + UI hop.

At search time( for eacmple -> "what did I say about the dentist last week") it does query expansion, rewriting the user's question into keywords plus hypothetical example items before retrieval. Multiple FTS lanes get merged with reciprocal rank fusion, then there's an optional Gemma reranker pass over the top-K with a 15s timeout and fallback to RRF order if it doesn't finish.

Curious what people here are doing with local LLMs on their phones lately. Any other good models to try out for local device.
If anyone wants to try it on their own device and share feedback, happy to share it . Mostly looking to know if the categorization holds up on real notes and any weirdness on first model

submitted by /u/Effective-Drawer9152
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 5/4DailyView insight →

Black Hat USA

AI Business

Sparse Federated Representation Learning for deep-sea exploration habitat design in carbon-negative infrastructure

Dev.to

Building a daily AI news brief in 325 lines of Python

Dev.to

Signal Lock: Closing the Prediction-Execution Gap in Agentic AI Systems

Reddit r/artificial

VS Code Quietly Reversed Its Copilot Co-Author Default — and the Dev Community Noticed

Dev.to

Gemma 4 E2B runs surprisingly well on my 8GB Android phone, so I built a private voice notes app around it.

Key Points

💡 Insights using this article

Related Articles

Black Hat USA

Sparse Federated Representation Learning for deep-sea exploration habitat design in carbon-negative infrastructure

Building a daily AI news brief in 325 lines of Python

Signal Lock: Closing the Prediction-Execution Gap in Agentic AI Systems

VS Code Quietly Reversed Its Copilot Co-Author Default — and the Dev Community Noticed

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer