Streaming Structured Inference with Flash-SemiCRF

arXiv cs.LG / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

Semi-Markov Conditional Random Fields (semi-CRFs) support segment-level labeling and exact boundary uncertainty, but prior implementations were limited by a prohibitive memory cost from explicitly materializing edge potential tensors as sequences get long.
The paper introduces an on-the-fly strategy that replaces the large stored edge tensor with prefix-sum lookups, greatly reducing memory usage as a function of segment length and label count.
It presents a streaming forward-backward algorithm with checkpoint-boundary normalization that keeps working memory sublinear in sequence length while still producing exact gradients.
Numerical stability is improved using zero-centered cumulative scores, which also yield an adaptive duration prior when label imbalance is present.
These techniques are packaged into Flash-SemiCRF, a fused Triton kernel and codebase that enables exact semi-CRF inference on speech-scale and even genomic-scale sequence lengths.

Abstract

Semi-Markov Conditional Random Fields (semi-CRFs) assign labels to segments of a sequence rather than to individual positions, enabling exact inference over segment-level features and principled uncertainty estimates at their boundaries. However, existing implementations must materialize a large edge potential tensor whose size grows with sequence length, maximum segment length, and label count, becoming prohibitive for speech-scale state spaces and intractable at genomic scales where sequences can exceed 100,000 positions. This memory bottleneck has limited the adoption of exact segment-level inference for long sequences and large label sets. We identify that the core inefficiency is materializing edge potentials that can instead be evaluated on-the-fly from a compact prefix-sum array, and make several improvements. First, replacing the stored edge tensor with prefix-sum lookup reduces the memory footprint by a factor proportional to the product of segment length and label count. Second, a streaming forward-backward pass with checkpoint-boundary normalization keeps working memory sublinear in sequence length while preserving exact gradients. Third, zero-centered cumulative scores control numerical drift and induce an adaptive duration prior under label imbalance. We integrate these ideas into Flash-SemiCRF, a fused Triton kernel that enables exact semi-CRF inference on previously intractable problem sizes. Available at https://github.com/biobenkj/flash-semicrf.

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Dev.to

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Reddit r/LocalLLaMA

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com

Dev.to

Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan)

Reddit r/LocalLLaMA

Streaming Structured Inference with Flash-SemiCRF

Key Points

Abstract

Related Articles

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com

Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer