FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

Reddit r/LocalLLaMA / 3/22/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

FeatherOps demonstrates fast FP8 matrix multiplication on RDNA3 GPUs even without native FP8 support, achieving performance close to the hardware's theoretical maximum.
It is currently a proof-of-concept within ComfyUI, with potential applicability to LLM training kernels beyond just inference.
The project traces its lineage to the original Feather kernel by Venom1806 (u/Venom1806 / SuriyaaMM) and aims for further optimization.
GitHub and Reddit links are provided, indicating ongoing community collaboration and iterative development.

https://github.com/woct0rdho/ComfyUI-FeatherOps

I'm working on it in ComfyUI, and the kernel can also be used in LLM training.

Although RDNA3 GPUs do not have native fp8, we can surprisingly see speedup with fp8. It's really close to the theoretical max performance of the hardware, unlike the fp16 matmul in ROCm that only reaches half of the max performance.

For now it's a proof of concept rather than great speedup in ComfyUI. It's been a long journey since the original Feather kernel was proposed by u/Venom1806 (SuriyaaMM), and let's see how it can be further optimized.

submitted by /u/woct0rdho
[link] [comments]

We Scanned 11,529 MCP Servers for EU AI Act Compliance

Dev.to

The Complete Guide to AI Prompts for Content Creators

Dev.to

Automating the Chase: AI for Festival Vendor Compliance

Dev.to

From Piles to Protocol: AI for Vendor Compliance at Scale

Dev.to

MCP Skills vs MCP Tools: The Right Way to Configure Your Server

Dev.to

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

Key Points

Related Articles

We Scanned 11,529 MCP Servers for EU AI Act Compliance

The Complete Guide to AI Prompts for Content Creators

Automating the Chase: AI for Festival Vendor Compliance

From Piles to Protocol: AI for Vendor Compliance at Scale

MCP Skills vs MCP Tools: The Right Way to Configure Your Server

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer