Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching
arXiv cs.AI / 3/16/2026
📰 NewsModels & Research
Key Points
- Mask2Flow-TSE is a two-stage target speaker extraction framework that combines discriminative masking for coarse separation with flow matching for refinement.
- The first stage performs discriminative masking to achieve coarse separation, while the second stage uses flow matching to refine the output toward the target speech.
- Unlike generative TSE methods that synthesize speech from Gaussian noise and often require many iterative steps, Mask2Flow-TSE starts from the masked spectrogram to enable high-quality reconstruction in a single inference step.
- Experiments show the approach achieves comparable performance to existing generative methods with approximately 85 million parameters.
Related Articles
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
[P] Vibecoded on a home PC: building a ~2700 Elo browser-playable neural chess engine with a Karpathy-inspired AI-assisted research loop
Reddit r/MachineLearning
Meet DuckLLM 1.0 My First Model!
Reddit r/LocalLLaMA
Since FastFlowLM added support for Linux, I decided to benchmark all the models they support, here are some results
Reddit r/LocalLLaMA
What measure do I use to compare nested models and non nested models in high dimensional survival analysis [D]
Reddit r/MachineLearning