Liquid AI's LFM2-24B-A2B running at ~50 tokens/second in a web browser on WebGPU

Reddit r/LocalLLaMA / 3/26/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

Liquid AIのLFM2-24B-A2B（MoEで総24B・アクティブ2B）をWebブラウザ上でWebGPU経由に動かし、M4 Max環境で約50 tokens/秒の推論速度が報告されています。
同じ環境で8B A1Bバリアントは100 tokens/秒超の速度が出たとされ、ローカル推論の体験に関する手応えが示されています。
デモとしてHugging Face SpacesのWebGPUデモ（LFM2-MoE-WebGPU）と、ONNX最適化モデル（8B/24B）が公開されています。
これにより、ブラウザだけでMoE系LLMの実行を現実的な速度で試せる選択肢が増え、開発者の実装・検証のハードルが下がる可能性があります。

Liquid AI's LFM2-24B-A2B running at ~50 tokens/second in a web browser on WebGPU

The model (MoE w/ 24B total & 2B active params) runs at ~50 tokens per second on my M4 Max, and the 8B A1B variant runs at over 100 tokens per second on the same hardware.

Demo (+ source code): https://huggingface.co/spaces/LiquidAI/LFM2-MoE-WebGPU
Optimized ONNX models:
- https://huggingface.co/LiquidAI/LFM2-8B-A1B-ONNX
- https://huggingface.co/LiquidAI/LFM2-24B-A2B-ONNX

submitted by /u/xenovatech
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/26DailyView insight →

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

I made a new programming language to get better coding with less tokens.

Dev.to

RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore

Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy

Reddit r/artificial

Liquid AI's LFM2-24B-A2B running at ~50 tokens/second in a web browser on WebGPU

Key Points

💡 Insights using this article

Related Articles

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

I made a new programming language to get better coding with less tokens.

RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer