WebSerial Vision Training for Microcontrollers: A Browser-Based Companion to On-Device CNN Training

arXiv cs.CV / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The paper introduces webmcu-vision-web, a single-file, zero-install browser app that enables end-to-end TinyML vision model training and deployment on a Seeed Studio XIAO ESP32-S3 Sense using only a Chromium-based browser.
  • It works as a local companion to on-device Arduino firmware, covering the full workflow from firmware flashing and image collection to CNN training, weight export, and live activation visualization, without sending data off the user’s machine.
  • The system supports in-browser firmware flashing (esptool-js), SD-card image browsing with preview and inline editing, and a config.json live-sync mechanism to adjust hyperparameters without recompiling.
  • Using TensorFlow.js, it reportedly finishes a three-class training run (about 30 images per class, 20 epochs) in ~1 minute in the browser versus ~9 minutes on-device, enabling an end-to-end cycle in under 10 minutes.
  • The authors validate stable convergence via a five-run consistency evaluation on a reference three-class problem and release all artifacts and MIT-licensed source code as a living template for adapting models to new hardware and tasks.

Abstract

This paper presents webmcu-vision-web, a single-file, zero-install browser application for end-to-end TinyML vision model training and deployment on the Seeed Studio XIAO ESP32-S3 Sense (XIAO ML Kit, $15--40 USD). Acting as a browser-based companion to the on-device Arduino firmware of Paper 1 [1], it provides a private, fully local machine learning pipeline, from firmware flashing through image collection, CNN training, weight export, and live activation visualization, without any software installation beyond a Chromium-based browser. The system targets educators, small businesses, and researchers who need to train task-specific visual classifiers under their exact deployment conditions. Key capabilities include: in-browser firmware flashing via esptool-js; an SD card file browser with image preview and inline editing; config.json live-sync for zero-recompile hyperparameter adjustment; webcam and ESP32 OV2640 camera image capture; TensorFlow.js CNN training completing a three-class run (~30 images per class, 20 epochs) in approximately 1 minute browser-side versus 9 minutes on-device, enabling a complete collect-train-deploy cycle in under 10 minutes; weight export as myWeights.bin and myWeights.h; confusion matrix; and a live Conv2 activation heatmap streamed from the ESP32 during inference. No data leaves the local machine at any stage. A five-run consistency evaluation on the three-class reference problem (0Blank, 1Cup, 2Pen) demonstrates stable convergence with mean accuracy and standard deviation reported; all artefacts are released at the repository link below. The repository is a living template for LLM-assisted adaptation to new hardware and tasks. All source code is MIT-licensed at https://github.com/webmcu-ai/webmcu-vision-web.