Streaming Structured Inference with Flash-SemiCRF
arXiv cs.LG / 4/22/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- Semi-Markov Conditional Random Fields (semi-CRFs) support segment-level labeling and exact boundary uncertainty, but prior implementations were limited by a prohibitive memory cost from explicitly materializing edge potential tensors as sequences get long.
- The paper introduces an on-the-fly strategy that replaces the large stored edge tensor with prefix-sum lookups, greatly reducing memory usage as a function of segment length and label count.
- It presents a streaming forward-backward algorithm with checkpoint-boundary normalization that keeps working memory sublinear in sequence length while still producing exact gradients.
- Numerical stability is improved using zero-centered cumulative scores, which also yield an adaptive duration prior when label imbalance is present.
- These techniques are packaged into Flash-SemiCRF, a fused Triton kernel and codebase that enables exact semi-CRF inference on speech-scale and even genomic-scale sequence lengths.
Related Articles
GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.
Dev.to
I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)
Dev.to
Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF
Reddit r/LocalLLaMA
Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com
Dev.to
Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan)
Reddit r/LocalLLaMA