| I'm self-hosting a totally free voice AI on my home server to help people learn speaking English. It has tens to hundreds of monthly active users, and I've been thinking on how to keep it free while making it sustainable. The ultimate way to reduce the operational costs is to run everything on-device, eliminating any server cost. So I decided to replicate the voice AI experience to fully run locally on my iPhone 15, and it's working better than I expected. One key thing that makes the app possible is using FluidAudio to offload STT and TTS to the Neural Engine, so llama.cpp can fully utilize the GPU without any contention. [link] [comments] |
Fully local voice AI on iPhone
Reddit r/LocalLLaMA / 3/26/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage
Key Points
- A developer describes building a fully local voice AI experience on an iPhone 15 to eliminate server costs and keep a voice-learning service free and sustainable.
- The setup uses FluidAudio to offload speech-to-text (STT) and text-to-speech (TTS) to the iPhone’s Neural Engine, allowing llama.cpp to run more effectively on the GPU without contention.
- They report that the on-device implementation performs better than expected and mention using a home server approach as a prior step.
- A GitHub repository (volocal) is shared so others can replicate the approach.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
I made a new programming language to get better coding with less tokens.
Dev.to
RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial