| The model runs 100% locally in the browser with Transformers.js. Fun fact: I had to slow down frame capturing by 120ms because the model was too fast! Once I figure out a better UX so users can follow the generated captions more easily (less jumping), we can remove that delay. Suggestions welcome! Online demo (+ source code): https://huggingface.co/spaces/LiquidAI/LFM2-VL-WebGPU [link] [comments] |
Real-time video captioning in the browser with LFM2-VL on WebGPU
Reddit r/LocalLLaMA / 3/14/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The post showcases a LFM2-VL model running fully offline in-browser using WebGPU and Transformers.js for real-time video captioning.
- The author notes a 120ms frame capture delay was needed to keep captions readable and mentions planning UX improvements to reduce caption jumping.
- An online demo with source code is available on HuggingFace Spaces, enabling easy experimentation.
- This demonstrates a browser-centric AI inference workflow with on-device processing, privacy advantages, and web-based deployment.