FishSpeech S2 Pro streaming code (380ms TTFA, tested on RTX 5090)

Reddit r/LocalLLaMA / 3/15/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The FishSpeech S2 Pro streaming code achieves about 380ms TTFA on an RTX 5090 when using torch.compile, according to the author's test setup.
Tests show TTFA around 800ms without torch.compile, and 380ms with torch.compile on the same hardware and driver version.
The author outlines future optimizations to reduce memory usage, refine TTFA, and support longer prompts, including profiling, smaller first chunks, and CUDA graphs.
A PR (1193) and a schematic diagram are linked to illustrate the data flow and the direction of the work, with encouragement for others to adopt the approach.

FishSpeech S2 Pro streaming code (380ms TTFA, tested on RTX 5090)

So... uh... yes I did a lot of debugging and learning and I'm your average webdev, not ML engineer so my apologies for cursed code 🤣

https://github.com/fishaudio/fish-speech/pull/1193/changes

Streaming should work end-to-end with low TTFA (~400ms until first audio chunk on Arch Linux, RTX 5090, NVIDIA driver 595.45.04, 9950x3D); there’s still work to do on memory, TTFA, and longer prompts.

Here's some ideas:

Figure out how to properly torch.compile, right now it just recompiles after warmup on smoke e2e test; and every recompile takes like 6 minutes.
Stream tokens into vocoder with a schedule (per lengyue), not one big chunk.
Cut memory use more and improve TTFA (profile, smaller first chunk, CUDA graphs).
Support longer prompts (~30–50 words) without OOM, possibly #1 should fix it.

I got a tiny bit of help from the maintainer, and so my solution while not really that impressive, should enable others to plumb into this direction.

This is an approximate diagram what is actually happening:

https://preview.redd.it/hgwrc6azb5pg1.png?width=845&format=png&auto=webp&s=29995a0a8ee8a25f2ba2410e1544ac15d9d85ef3

This could be improved. As far as I'm getting DAC can just process tokens on its own with some clever scheduling, and not hold LLM until it actually finishes making PCM chunk 🤷

Anyway, here's my tests.

Without torch.compile TTFA is around 800ms

https://preview.redd.it/1t1en4c0f5pg1.png?width=1622&format=png&auto=webp&s=8199dfc7ff4393ca06144df9a30a801101c1a2fa

With torch.compile (380ms) + some logs / instrumentation

https://preview.redd.it/b7rkejvan5pg1.png?width=2547&format=png&auto=webp&s=3dedb4f7745102b5b1aa77c06da897cfab6d0a73

I'm testing my own branch and found some issues but the main streaming code should be working. There's also a lot of unrelated things, kinda QoL updates for adding reference voices, Makefile, tests, etc.

submitted by /u/konovalov-nk
[link] [comments]

NVIDIA、GTC 2026で次世代AI基盤を発表「Vera Rubin」を軸にエージェント・ゲーム・宇宙領域へ展開のサムネイル画像

Ledge.ai

1Password、AIエージェントのアクセス制御を統合管理する「Unified Access」発表人間・マシン・AIの資格情報を一元統制のサムネイル画像

Ledge.ai

『モンドーモンドー』｜夏目龍頭流闇文学｜AI画像生成｜自由詩｜散文詩｜ホラー｜ダークファンタジー｜深淵図書館

note

「お金、見直したいけどどこから？」AIが改善ヒントを教えてくれる、公式プロンプトを公開

note

Copilotと物語を作ってみた #213 めーっちゃボロボロこぼす女の子の物語

note

FishSpeech S2 Pro streaming code (380ms TTFA, tested on RTX 5090)

Key Points

Related Articles

NVIDIA、GTC 2026で次世代AI基盤を発表「Vera Rubin」を軸にエージェント・ゲーム・宇宙領域へ展開のサムネイル画像

1Password、AIエージェントのアクセス制御を統合管理する「Unified Access」発表人間・マシン・AIの資格情報を一元統制のサムネイル画像

『モンドーモンドー』｜夏目龍頭流闇文学｜AI画像生成｜自由詩｜散文詩｜ホラー｜ダークファンタジー｜深淵図書館

「お金、見直したいけどどこから？」AIが改善ヒントを教えてくれる、公式プロンプトを公開

Copilotと物語を作ってみた #213 めーっちゃボロボロこぼす女の子の物語

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Related Articles

NVIDIA、GTC 2026で次世代AI基盤を発表 「Vera Rubin」を軸にエージェント・ゲーム・宇宙領域へ展開のサムネイル画像

1Password、AIエージェントのアクセス制御を統合管理する「Unified Access」発表 人間・マシン・AIの資格情報を一元統制のサムネイル画像

『モンドーモンドー』｜夏目龍頭流闇文学｜AI画像生成｜自由詩｜散文詩｜ホラー｜ダークファンタジー｜深淵図書館

「お金、見直したいけどどこから？」AIが改善ヒントを教えてくれる、公式プロンプトを公開

Copilotと物語を作ってみた #213 めーっちゃボロボロこぼす女の子の物語

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

NVIDIA、GTC 2026で次世代AI基盤を発表「Vera Rubin」を軸にエージェント・ゲーム・宇宙領域へ展開のサムネイル画像

1Password、AIエージェントのアクセス制御を統合管理する「Unified Access」発表人間・マシン・AIの資格情報を一元統制のサムネイル画像