AI Navigate

Real-time video captioning in the browser with LFM2-VL on WebGPU

Reddit r/LocalLLaMA / 3/14/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The post showcases a LFM2-VL model running fully offline in-browser using WebGPU and Transformers.js for real-time video captioning.
  • The author notes a 120ms frame capture delay was needed to keep captions readable and mentions planning UX improvements to reduce caption jumping.
  • An online demo with source code is available on HuggingFace Spaces, enabling easy experimentation.
  • This demonstrates a browser-centric AI inference workflow with on-device processing, privacy advantages, and web-based deployment.
Real-time video captioning in the browser with LFM2-VL on WebGPU

The model runs 100% locally in the browser with Transformers.js. Fun fact: I had to slow down frame capturing by 120ms because the model was too fast! Once I figure out a better UX so users can follow the generated captions more easily (less jumping), we can remove that delay. Suggestions welcome!

Online demo (+ source code): https://huggingface.co/spaces/LiquidAI/LFM2-VL-WebGPU

submitted by /u/xenovatech
[link] [comments]