Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B

Reddit r/LocalLLaMA / 4/6/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • A Reddit post shares a project (“parlor”) that enables real-time AI with audio/video input and voice output running on an M3 Pro using Gemma E2B.
  • The author claims this setup is particularly impactful for language learning, enabling interactive, multilingual voice-based assistance that users can switch to their native language.
  • The post contrasts the current model’s limitations for “agentic coding” while positioning the real-time multimodal experience as a “game-changer” for learners.
  • It suggests a forward-looking use case where similar functionality could eventually run locally on phones for camera-assisted object description and conversation.
  • The article points readers to the GitHub repository for hands-on experimentation and implementation details.
Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B

Sure you can't do agentic coding with the Gemma 4 E2B, but this model is a game-changer for people learning a new language.

Imagine a few years from now that people can run this locally on their phones. They can point their camera at objects and talk about them. And this model is multi-lingual, so people can always fallback to their native language if they want. This is essentially what OpenAI demoed a few years ago.

Repo: https://github.com/fikrikarim/parlor

submitted by /u/ffinzy
[link] [comments]