| So... uh... yes I did a lot of debugging and learning and I'm your average webdev, not ML engineer so my apologies for cursed code 🤣 https://github.com/fishaudio/fish-speech/pull/1193/changes Streaming should work end-to-end with low TTFA (~400ms until first audio chunk on Arch Linux, RTX 5090, NVIDIA driver 595.45.04, 9950x3D); there’s still work to do on memory, TTFA, and longer prompts. Here's some ideas:
I got a tiny bit of help from the maintainer, and so my solution while not really that impressive, should enable others to plumb into this direction. This is an approximate diagram what is actually happening: This could be improved. As far as I'm getting DAC can just process tokens on its own with some clever scheduling, and not hold LLM until it actually finishes making PCM chunk 🤷 Anyway, here's my tests. Without With I'm testing my own branch and found some issues but the main streaming code should be working. There's also a lot of unrelated things, kinda QoL updates for adding reference voices, Makefile, tests, etc. [link] [comments] |
FishSpeech S2 Pro streaming code (380ms TTFA, tested on RTX 5090)
Reddit r/LocalLLaMA / 3/15/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage
Key Points
- The FishSpeech S2 Pro streaming code achieves about 380ms TTFA on an RTX 5090 when using torch.compile, according to the author's test setup.
- Tests show TTFA around 800ms without torch.compile, and 380ms with torch.compile on the same hardware and driver version.
- The author outlines future optimizations to reduce memory usage, refine TTFA, and support longer prompts, including profiling, smaller first chunks, and CUDA graphs.
- A PR (1193) and a schematic diagram are linked to illustrate the data flow and the direction of the work, with encouragement for others to adopt the approach.
Related Articles
I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).
Dev.to
Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants
Reddit r/LocalLLaMA
Die besten AI Tools fuer Digital Nomads 2026
Dev.to
I Built the Most Feature-Complete MCP Server for Obsidian — Here's How
Dev.to
A supervisor or "manager" Al agent is the wrong way to control Al
Reddit r/artificial