Streaming Structured Inference with Flash-SemiCRF

arXiv cs.LG / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • Semi-Markov Conditional Random Fields (semi-CRFs) support segment-level labeling and exact boundary uncertainty, but prior implementations were limited by a prohibitive memory cost from explicitly materializing edge potential tensors as sequences get long.
  • The paper introduces an on-the-fly strategy that replaces the large stored edge tensor with prefix-sum lookups, greatly reducing memory usage as a function of segment length and label count.
  • It presents a streaming forward-backward algorithm with checkpoint-boundary normalization that keeps working memory sublinear in sequence length while still producing exact gradients.
  • Numerical stability is improved using zero-centered cumulative scores, which also yield an adaptive duration prior when label imbalance is present.
  • These techniques are packaged into Flash-SemiCRF, a fused Triton kernel and codebase that enables exact semi-CRF inference on speech-scale and even genomic-scale sequence lengths.

Abstract

Semi-Markov Conditional Random Fields (semi-CRFs) assign labels to segments of a sequence rather than to individual positions, enabling exact inference over segment-level features and principled uncertainty estimates at their boundaries. However, existing implementations must materialize a large edge potential tensor whose size grows with sequence length, maximum segment length, and label count, becoming prohibitive for speech-scale state spaces and intractable at genomic scales where sequences can exceed 100,000 positions. This memory bottleneck has limited the adoption of exact segment-level inference for long sequences and large label sets. We identify that the core inefficiency is materializing edge potentials that can instead be evaluated on-the-fly from a compact prefix-sum array, and make several improvements. First, replacing the stored edge tensor with prefix-sum lookup reduces the memory footprint by a factor proportional to the product of segment length and label count. Second, a streaming forward-backward pass with checkpoint-boundary normalization keeps working memory sublinear in sequence length while preserving exact gradients. Third, zero-centered cumulative scores control numerical drift and induce an adaptive duration prior under label imbalance. We integrate these ideas into Flash-SemiCRF, a fused Triton kernel that enables exact semi-CRF inference on previously intractable problem sizes. Available at https://github.com/biobenkj/flash-semicrf.