WebSerial Vision Training for Microcontrollers: A Browser-Based Companion to On-Device CNN Training

arXiv cs.CV / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces webmcu-vision-web, a single-file, zero-install browser app that enables end-to-end TinyML vision model training and deployment on a Seeed Studio XIAO ESP32-S3 Sense using only a Chromium-based browser.
It works as a local companion to on-device Arduino firmware, covering the full workflow from firmware flashing and image collection to CNN training, weight export, and live activation visualization, without sending data off the user’s machine.
The system supports in-browser firmware flashing (esptool-js), SD-card image browsing with preview and inline editing, and a config.json live-sync mechanism to adjust hyperparameters without recompiling.
Using TensorFlow.js, it reportedly finishes a three-class training run (about 30 images per class, 20 epochs) in ~1 minute in the browser versus ~9 minutes on-device, enabling an end-to-end cycle in under 10 minutes.
The authors validate stable convergence via a five-run consistency evaluation on a reference three-class problem and release all artifacts and MIT-licensed source code as a living template for adapting models to new hardware and tasks.

Abstract

This paper presents webmcu-vision-web, a single-file, zero-install browser application for end-to-end TinyML vision model training and deployment on the Seeed Studio XIAO ESP32-S3 Sense (XIAO ML Kit, $15--40 USD). Acting as a browser-based companion to the on-device Arduino firmware of Paper 1 [1], it provides a private, fully local machine learning pipeline, from firmware flashing through image collection, CNN training, weight export, and live activation visualization, without any software installation beyond a Chromium-based browser. The system targets educators, small businesses, and researchers who need to train task-specific visual classifiers under their exact deployment conditions. Key capabilities include: in-browser firmware flashing via esptool-js; an SD card file browser with image preview and inline editing; config.json live-sync for zero-recompile hyperparameter adjustment; webcam and ESP32 OV2640 camera image capture; TensorFlow.js CNN training completing a three-class run (~30 images per class, 20 epochs) in approximately 1 minute browser-side versus 9 minutes on-device, enabling a complete collect-train-deploy cycle in under 10 minutes; weight export as myWeights.bin and myWeights.h; confusion matrix; and a live Conv2 activation heatmap streamed from the ESP32 during inference. No data leaves the local machine at any stage. A five-run consistency evaluation on the three-class reference problem (0Blank, 1Cup, 2Pen) demonstrates stable convergence with mean accuracy and standard deviation reported; all artefacts are released at the repository link below. The repository is a living template for LLM-assisted adaptation to new hardware and tasks. All source code is MIT-licensed at https://github.com/webmcu-ai/webmcu-vision-web.

Black Hat USA

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

How I Automate My Dev Workflow with Claude Code Hooks

Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

Dev.to

WebSerial Vision Training for Microcontrollers: A Browser-Based Companion to On-Device CNN Training

Key Points

Abstract

Related Articles

Black Hat USA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

How I Automate My Dev Workflow with Claude Code Hooks

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer