AI Navigate

インサイトインサイト最新記事最新記事一覧 AI大全AI大全カオスマップAIカオスマップ

ggml-cuda: add flash-attn support for DKQ=320/DV=256 with ncols2=32 (… by lnigam · Pull Request #22286 · ggml-org/llama.cpp

Reddit r/LocalLLaMA / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Read original →

共有:

Key Points

The ggml-cuda project has added FlashAttention support for specific parameter settings (DKQ=320/DV=256 with ncols2=32).
The update is reported to improve CUDA performance for Mistral Small 4, where a CPU fallback previously reduced speed.
By enabling the optimized kernel path on the GPU, the change likely lowers latency and increases throughput compared to the earlier fallback behavior.
The discussion notes speculation about whether the improvement could relate to upcoming Mistral releases, though no definitive connection is stated.

ggml-cuda: add flash-attn support for DKQ=320/DV=256 with ncols2=32 (… by lnigam · Pull Request #22286 · ggml-org/llama.cpp

Improves the speed of Mistral Small 4 on CUDA

(there was a CPU fallback before)

(I wonder if it’s somehow related to the upcoming Mistral model? Maybe not)

submitted by /u/jacek2023
[link] [comments]

Related Articles

Black Hat USA

Black Hat USA

AI Business

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

関連おすすめサービス

※当サイトはアフィリエイト広告を利用しています

Notta搭載AI議事録イヤホン ZENCHORD1

AI時代の仕事術。Notta搭載で会議の議事録を自動生成するスマートイヤホン。

AI搭載ボイスレコーダー Plaud

世界100万人が愛用。AIで文字起こし・要約を自動化するボイスレコーダー。

画像高画質化AIツール Aiarty Image Enhancer

AIで画像を高画質化。写真・イラストを簡単にアップスケール。