I built a screen-free, storytelling toy for kids with Qwen3-TTS

Reddit r/LocalLLaMA / 3/16/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The author built an open-source, screen-free storytelling toy for kids that enables interactive conversations with story characters without sending transcripts to cloud providers.
It uses a voice AI stack including ESP32 on Arduino, MLX-audio for STT (Whisper) and TTS (Qwen3-tts, chatterbox-turbo), MLX-vlm for vision-language models (Qwen3.5-9B, Mistral), MLX-lm for LLMs (Qwen3, Llama3.2), and Secure WebSockets to interface with a MacBook.
The project supports on-device inference on Apple Silicon (M1/M2/M3/M4/M5) with Windows support planned for the future.
The code is available at the open-toys GitHub repo (https://github.com/akdeb/open-toys), inviting community feedback.

I built a screen-free, storytelling toy for kids with Qwen3-TTS

I built an open-source, storytelling toy for my nephew who uses a Yoto toy. My sister told me he talks to the stories sometimes and I thought it could be cool if he could actually talk to those characters in stories but not send the conversation transcript to cloud providers.

This is my voice AI stack:

ESP32 on Arduino to interface with the Voice AI pipeline
MLX-audio for STT (whisper) and TTS (`qwen3-tts` / `chatterbox-turbo`)
MLX-vlm to use vision language models like Qwen3.5-9B and Mistral
MLX-lm to use LLMs like Qwen3, Llama3.2
Secure Websockets to interface with a Macbook

This repo supports inference on Apple Silicon chips (M1/2/3/4/5) but I am planning to add Windows soon. Would love to hear your thoughts on the project.

This is the github repo: https://github.com/akdeb/open-toys

submitted by /u/hwarzenegger
[link] [comments]