TritonSigmoid: A fast, padding-aware sigmoid attention kernel for GPUs [R]

Reddit r/MachineLearning / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The article announces the open-sourcing of TritonSigmoid, a fast, padding-aware sigmoid-attention GPU kernel built with Triton.
  • It targets single-cell foundation models where each cell is a gene token sequence with highly variable lengths, enabling native handling of padding to avoid wasted compute.
  • Experiments show TritonSigmoid achieves up to 515 TFLOPS on H100 GPUs, outperforming FlashAttention-2 (361) and FlashSigmoid (440).
  • The kernel improves modeling quality, reporting lower validation loss than softmax attention across six datasets and better cell-type separation (about 25%), while also enabling stable training where softmax diverges.
  • The work is shared with both an arXiv paper (2604.27124) and an accompanying GitHub repository for community use and feedback.

We are open-sourcing TritonSigmoid — a fast, padding-aware sigmoid attention kernel for GPUs.

We built this for single-cell foundation models, where every cell is represented as a sequence of genes. A single gene can be regulated by multiple transcription factors at once. Softmax forces them to compete for attention, but sigmoid lets the model attend strongly to many genes (tokens) simultaneously. Because cells express anywhere from 200 to 16,000+ genes (tokens), the kernel handles variable-length padding natively so you're not wasting compute on empty positions.

What we found during our experiments:
• Hardware: Up to 515 TFLOPS on H100 (vs. FlashAttention-2 at 361, FlashSigmoid at 440)
• Accuracy: Lower validation loss than softmax attention across 6 held-out datasets
• Representation: 25% better cell-type separation
• Stability: Stable training where softmax catastrophically diverges

We would welcome any discussion or feedback.

Links to our work:
Paper: https://arxiv.org/abs/2604.27124
Code: https://github.com/MSDLLCpapers/triton-sigmoid

submitted by /u/vjysd
[link] [comments]